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Cas Protospacer Decoded 

WANG ET AL., PAGE 840 

The structure of the Cas1 -Cas2-DNA complex reveals the dual-forked nature of 
the protospacer — the invading DNA sequence selected by Cas1 and Cas2 — 
and explains how its sequence and length are determined. 



A Structural Core for Low Complexity 

XIANG ET AL., PAGE 829 

A chemical footprinting method reveals that polymers of low-complexity 
domains exhibit similar cross-p structure in hydrogels, liquid-like droplets, 
and in nuclei of mammalian cells, suggesting a common underlying structural 
basis. 



Cascade Effects 

REDDING ET AL, PAGE 854 

Single-molecule analysis of the bacterial Cascade complex reveals how two distinct Cas pathways are engaged based on 
the presence or absence of PAM sequences in the DNA target, providing insights into how the CRISPR machinery adapts 
to mutations that allow foreign DNA to escape immunity. 



Microtubules Mismanage Broken Chromosomes 

LOTTERSBERGER ET AL., PAGE 880 

Increased chromatin mobility at double-stranded break sites is mediated by nuclear envelope-associated proteins and micro- 
tubule dynamics, contributing to aberrant DNA repair. 



Steroid Reiease on Steroids 

YAMANAKA ET AL., PAGE 907 

The Drosophila hormone ecdysone is released from cells through calcium-regulated vesicle trafficking, suggesting that 
steroid hormone release may be an active process rather than simple diffusion across the membrane. 



Decorated Histone Veterans 

XIE ET AL., PAGE 920 

During stem cell division, a transient phosphate modification on histone H3 distinguishes pre-existing and newly synthesized 
histones and is required for the asymmetric segregation of sister chromatids— one enriched with new histones, the other 
with old. 



Polytene Bands = TAD 

EAGEN ET AL., PAGE 934 

Equating polytene bands with topologically associating domains (TADs) in 
interphase nuclei reveals two stable forms of folded chromatin within euchro- 
matic regions of diploid cells that are distinct from more highly structured 
heterochromatin. 



Springing Out of Shape 

CHIARUTTINI ET AL., PAGE 866 

A component of the ESCRT-III membrane fission machinery self-organizes into 
spiral springs that trigger membrane deformation when released. 
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Peroxisomes on Noise Patroi 

DELMAGHANI ET AL., PAGE 894 

Hypervulnerability to sound exposure in humans and mice with pejvakin muta- 
tions results from reduced adaptive proliferation of peroxisomes in response to 
the oxidative stress caused by loud sounds. 



A TAD-bit to the Right 

HU ET AL., PAGE 947 

Chromosomal domains demarcated by CTCF and cohesin binding constrain 
RAG-mediated recombination and support a linear tracking model where 
breaks at defined motifs occur in a convergent orientation. 



Endometriosis: The Beta Version 

HAN ET AL., PAGE 960 

Estrogen receptor p (ERp) interacts with the cytoplasmic apoptotic machinery and the inflammasome complex to promote 
endometriosis— a significant cause of infertility. 



Aire’s Heir 

TAKABA ET AL., PAGE 975 

To promote immunological self-tolerance, Fezf2 directly regulates transcription of tissue-restricted antigen genes in the 
thymus, where it functions independent of the known self-tolerance regulator, Aire. 



m^A Coping Mechanism 

MEYER ET AL., PAGE 999 

A/®-methyladenosine (m^A) modification is selectively increased in the 5' UTR of mRNAs during cellular stress to promote 
translation initiation through a mechanism that does not require the 5' cap or cap-binding proteins. 



A Fine Look at Primary Prostate Cancer 

SCHULTZ ET AL., PAGE 1011 

Molecular analysis of 333 primary prostate carcinomas reveals substantial heterogeneity and major subtypes among patients, 
as well as potentially actionable lesions valuable for clinical management of the disease. 



A Check on Cardiomyocyte Proiiferation 

ALKASS ET AL., PAGE 1026 

Contrary to a recent report suggesting that a preadolescent burst of cardiomyo- 
cyte proliferation promotes heart growth, new evidence indicates that cardio- 
myocyte number expansion appears limited to the neonatal period, with 
cardiomyocyte hypertrophy likely accounting for the increase in the heart size. 



Protective Fingerprints 

CHUNG ET AL., PAGE 988 

Systems serology reveals unique vaccine-induced “fingerprints,” highlighting 
potential markers of protection against HIV and providing a powerful method 
for comparing candidate vaccines. 
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Net Works for Malaria Control 



In just a couple days, Youyou Tu, William C. Campbell and 
Satoshi Omura will meet His Majesty King Carl XVI Gustaf 
of Sweden to receive the Nobel Prize in Physiology or Medi- 
cine. Tu’s discovery of artemisinin, a potent anti-malarial, and 
Campbell and Omura’s work on ivermectin, a broad-spec- 
trum antiparasitic drug, has helped to mitigate the burden 
of parasitic diseases, especially in the developing world. 
Despite all effort and advances, stopping the spread of pre- 
ventable parasitic diseases remains an unmet goal. Bhatt 
et al. (Bhatt et al., 2015) now present the first data-driven 
comprehensive picture of how specific interventions for ma- 
laria control have impacted the spread of this disease in the 
past 1 5 years. 




A demonstration of how to set up a mosquito bed net. Image cour- 
tesy of Andre Roussel, USAID. 

In 1 978, every 6 seconds one child would die from malaria. 
Today, one child still dies from malaria every minute. Malaria 
is a burden of approximately 600,000 deaths every year, 
concentrated largely in sub-Saharan Africa (Hemingway, 
201 5). First line treatment with the drug chloroquine and con- 
trol of the exposure to the insect that transmits the disease 
with indoor spraying of the insecticide DDT failed, mostly 
due to acquisition of resistance by the Plasmodium —the ma- 
laria parasite, and the anopheline mosquitoes, the malaria 
vector. 

The critical turning point in the combat against the disease 
was the launch of the Roll Back Malaria Initiative and the 
wider development agenda around the United Nations Mil- 
lennium Development Goals, kicked off in the year 2000. 
The ambitious agenda: begin to reverse malaria incidence 
and to halt malaria spread by 2015. Since then, funding for 
malaria control has increased 20 x, split between access to 
insecticide-treated bed nets, indoor residual spraying— ac- 
tions to reduce exposure to the mosquitoes— prompt treat- 
ment of clinical malaria cases, and substitution of old drugs 
for highly efficacious artemisinin-based combination 
therapy. 



Now that the benchmark year of 2015 has been reached, 
Bhatt et al. attempt to quantify the prevalence of Plasmodium 
falciparum infection and disease incidence across sub-Sa- 
haran Africa from 2000 to 2015, as well as to define the role 
major interventions have had in causing changes in malaria 
endemicity. By modeling disease transmission, the authors 
were able to generate counterfactual geospatial maps that 
provide estimation of what malaria parasite prevalence rates 
would look like today without each intervention. In total, they 
estimate that 663 million clinical cases of malaria were 
averted between 2000 and 2015. The distribution of bed 
nets alone was responsible for 68% percent of this improve- 
ment, followed by 22% resulting from artemisinin-based 
combination therapy, and 10% from indoor residual spray- 
ing. There are caveats to these numbers. For instance, they 
vary to some extent within different territories, and they are 
affected by how early and the scale that each intervention 
was deployed. Nonetheless, it may come as a surprise that 
such large fraction of the improvement can be attributed to 
interventions focused on mosquito-control. 

Although the incidence of malaria has decreased to half of 
what it used to be 15 years ago, the new data suggest 
caution. Millions of people are still at risk of malaria disease 
and death in Africa, and rates of improvement slowed 
down about 5% per year in 2013. Continued distribution of 
bed nets, replacement of the old ones, surveillance for 
arousal of parasite and mosquito resistance— already docu- 
mented in other areas in the world— will be essential steps to 
reduce the number of disease cases. 

The World Health Organization and the Roll Back Malaria 
Partnership now have moved onto defining goals and priority 
actions for malaria control in the next 15 years period. 
Beyond providing an accurate picture of the effectiveness 
of malaria interventions in the recent past, the new data will 
be crucial to inform policy agencies with an optimal strategy 
for the future. 
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Cardiomyocyte Cell-Cycle 
Activity during Preadolescence 



Earlier studies (Soonpaa et al., 1996) 
revealed a rapid drop-off of ventricular 
cardiomyocyte cell-cycle activity at birth 
in mice, followed by a burst of DNA syn- 
thesis during the first week of postnatal 
life which contributed to the formation of 
multi-nucleated cardiomyocytes by post- 
natal day 10 (PN10). It has recently been 
suggested that a second burst of cardio- 
myocyte cell-cycle activity occurs dur- 
ing preadolescence, between PN14 and 
PN18, resulting in a 40% increase in car- 
diomyocyte number (Naqvi et al., 2014). 
Since there was no overt change in 
mono- versus bi-nuclear cardiomyocyte 



content and no change in cardiomyocyte 
nuclear ploidy between PN14 and PN18, 
a 40% increase in ventricular cardiomyo- 
cyte number during preadolescence 
should result in newly synthesized DNA 
in 57% of the cardiomyocyte nuclei. 

To characterize this putative burst of 
preadolescent cell-cycle activity, MHC- 
nU\C mice, expressing a nuclear-localized 
p-galactosidase reporter in cardiomyo- 
cytes and maintained in a DBA/2J back- 
ground (Soonpaa et al., 1994), were 
implanted with BrdU-containing osmotic 
mini-pumps. Cumulative ventricular cardi- 
omyocyte DNA synthesis was quantitated 




DNA 




Ki-67 


/ Ki-67 


\ 


\ 


\ 


\ 














4 


4 













Anilin Aurora A Cyc B1 Ki-67 



i I 




10 14 15 16 21 10 14 15 16 21 

Survivin Plk-1 



by co-localization of p-galactosidase and 
BrdU immune reactivity (Figure 1A) as 
described (Reuter et al., 2014). Only low 
levels of cardiomyocyte DNA synthesis 
were detected in mice carrying pumps 
from PN10 through PN19 (2.96% ± 
0.55%) or from PN12 through PN19 
(1.09% ± 0.33%; see also Table SI A). 
BrdU was detected in small intestine crypt 
cells by 24 hr postimplantation, and at the 
end of the labeling period (Figure 1 B), con- 
firming continuous infusion. Cumulative 
preadolescent cardiomyocyte DNA syn- 
thesis was also quantitated in C57BI/6J 
inbred mice (the strain used by Naqvi and 
colleagues); S-phase cardiomyocytes 
were identified by nuclear BrdU immune 
reactivity in dispersed cell preparations 
(Figure 1C). Only low rates of cardiomyo- 
cyte DNA synthesis were detected (Ta- 
ble SIB). Mice receiving a single BrdU 
injection on PN14.5, PN15, or PN16 and 
analyzed on PN19 also had little labeling 
(Table SI C), arguing that BrdU cytotoxicity 
and/or the presence of the osmotic mini- 
pump perse were not confounding factors. 



Figure 1. Characterization of Car- 
diomyocyte Cell-Cycle Parameters during 
Preadolesence 

(A) Example of cardiomyocyte DNA synthesis 
(arrow) in the heart of an MHC-nl_AC mouse car- 
rying a BrdU mini-pump from PN1 0 through PN1 9. 
Paneis show Hoechst staining of DNA (biue signai), 
beta-gaiactosidase immune reactivity (red signai) 
and BrdU immune reactivity (green signai). Scaie 
bar, 10 microns. 

(B) BrdU immune reactivity (green signai) in the 
smaii intestine of an MHC-nl_AC mouse carrying a 
BrdU mini-pump for 24 hr (ieft) or 9 days (right). 
Arrows, viiii crypts. Scaie bar, 200 microns. 

(C) Exampie of cardiomyocyte DNA synthesis 
(arrow) in dispersed ceiis from a C57Bi/6J mouse 
heart carrying a BrdU mini-pump from PN10 
through PN19. Paneis show Hoechst staining of 
DNA (biue signai) and BrdU immune reactivity 
(green signai). Scaie bar, 20 microns. 

(D) Exampie of an S-phase cardiomyocyte nucieus 
in a PN15 heart as evidenced by co-iocaiization of 
Nkx2.5 (red signai) and Ki-67 (green signai) im- 
mune reactivity (arrow). Scaie bar, 10 microns. 

(E) Exampie of cardiomyocyte (upper panei) and 
non-cardiomyocyte (iower panei) H3P immune 
reactivity (green signai, arrow) in postnatai hearts. 
Cardiomyocytes were identified by a-actinin im- 
mune reactivity (red signai). Scaie bar, 10 microns. 

(F) Cardiac expression of mitosis-reiated genes on 
PN10, PN14, PN15, PN16, and PN21 in C57BL/6J 
mice (mean ± SEM) reiative to their ievei at P10. 
mRNA ieveis were quantitated and normaiized to 
18S as described (Livak and Schmittgen, 2001). 
Significance was tested using unpaired, 2-taiied 
Student’s t tests with Bonferroni correction for 
muitipie testing; bars indicate p < 0.05 versus 
subsequent time point. 
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Cardiomyocyte cell-cycle activity was 
also quantitated via co-localization of 
Nkx2.5 and Ki-67 immune reactivity 
(Figure 1D); Ki-67 is expressed from G1 
phase to anaphase and thus provides a 
very good estimate of the fraction of a 
given cell population with cell-cycle activ- 
ity (Lopez et al., 1 994). The cardiomyocyte 
nuclear Ki-67 labeling never exceeded 
1 % in C57BI/6J ventricles between 
PN12 and PN16 (Table S1D), in agree- 
ment with the BrdU incorporation data. 
Phosphorylation of histone H3 on serine 
10 (H3P), which labels cells from G2/M 
through early anaphase (Hendzel et al., 
1997), was also used to monitor cardio- 
myocyte cell-cycle activity. Since cardio- 
myocyte mitosis is characterized by 
sarcomere disassembly (Engel et al., 
2006), mitotic ventricular cardiomyocytes 
can be identified by the presence of H3P 
signal and sarcomere disassembly 
(Figure IE). No mitotic cardiomyocytes 
were observed after PN12 (Table S1E). 

Ventricular expression levels of a panel 
of mitosis-related genes (encoding anilin, 
aurora A, polo-like kinase 1 , survivin, cy- 
clin B1, and Ki-67) were measured by 
qPCR in C57BI/6J mice, and the relative 
levels of expression were normalized to 
that of 18S rRNA as described (Liu et al., 
2015). No changes in transcript levels 
supporting the presence of a proliferative 
burst between PN14 and PN16 were de- 
tected (Figure IF). Collectively, the low 
rates of cardiomyocyte DNA synthesis 
(analyzed by M.H.S. and L.J.F.), the low 
levels of cardiomyocyte Ki-67 and H3P 
immune reactivity (analyzed by D.C.Z. 
and F.B.E.) and the absence of transient 
increases in mitotic transcripts (analyzed 
by C.P and A.R.) during preadolescence 
are consistent with the pattern of gradual 
postnatal ventricular cardiomyocyte cell- 
cycle withdrawal reported earlier using 
single injections of tritiated thymidine to 



monitor cardiomyocyte cell-cycle activity 
(Soonpaa et al., 1996). 

Potentially trivial factors such as the 
method for defining postnatal age, strain 
differences, BrdU cytotoxicity, cell identi- 
fication in histologic samples, and the se- 
quences of the PCR primers used, were 
controlled for and thus cannot explain 
the differences in the results presented 
here and those of Naqvi and colleagues. 
Whether subtle differences in animal hus- 
bandry can explain the differences in the 
measured parameters between the two 
studies remains unclear and worthy of 
further investigation. Indeed, the current 
study does not rule out the possibility 
that a preadolescent burst in cardiomyo- 
cyte cell-cycle activity can exist, 
assuming that as of yet undefined optimal 
conditions of litter size, nutrients, etc., are 
met. However, the data presented here 
clearly indicate that preadolescent devel- 
opment did occur in the absence of a 
burst of cardiomyocyte proliferation over 
multiple litters in three independent 
breeding colonies. Consequently, a burst 
of preadolescent cell-cycle activity is not 
required for normal cardiac development. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes one supple- 
mental table and can be found with this article 
online at http://dx.doi.Org/10.1016/j.cell.2015.10. 
037. 
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our conclusions; we believe 
that differences in experimental 
design and methodology could 
account for the discrepancy. 

Alkass et al. (2015), using 
design-based stereology, con- 
cluded that the murine CM pop- 
ulation does not increase during 
preadolescence. Their mea- 
surements of CM numbers g 

(Figure SIC in Alkass et al.), 
when pooled into groups en- 
compassing P5-P9 and P11- 
P100, show an ~20% increase 
between the two develop- 
mental periods (p < 0.001), indi- 
cating a CM population expan- 
sion at some time after P9 (or 
early preadolescence). But the 
authors, based on comparisons 
of CM numbers on single days, 
conclude that there was no in- 
crease during preadolescence 
(Figure 1 E in Alkass et al.). This 
conclusion may stem from the 
large variance in their data be- 
tween P11 and PI 00, together 
with a low number of replicates. 
Given the reported SD of means 
(for CM numbers) between P1 1 
and PI 00, and with three repli- 
cates, their study is underpow- 
ered to reliably refute even a 
40% change in CM numbers. 

In agreement with our find- 
ings, analysis of raw data 




Our counting method relies on similar 
tissue digestion/disaggregation effi- 
ciencies between hearts of different 
ages. To validate our findings, we calcu- 
lated CM numbers in the heart by dividing 
total ventricular volume (occupied by 
CMs) by the average CM volume (Naqvi 
et al., 2014), an approach independent 
of digestion efficiency. This also revealed 
an increase in CM numbers during pread- 
olescence, supporting the numbers esti- 
mated by direct cell counting. Alkass 
et al. offer the unsupported assertion 
that apparent increases in the CM popula- 
tion are caused by variations in digestion 
efficiency between hearts of different 
ages, but this is refuted by our finding 
of markedly reduced CM 
numbers in mice of the same 



Aurora B, cTNT, DAP I 

Figure 1 . Verifying CM Mitoses in P15 Mouse Cardiac Left Ventricle 
by an Independent Laboratory from Blinded Histological Heart 
Samples 

(A and B) Immunohistochemical identification of mitotic CMs (red staining, 
Aurora B-positive ceiis) in transverse cut tissue sections showing iocaiization 
in the ieft ventricie (LV) of P1 5 mice (B), with no Aurora B-positive ceiis evident 
in the LV of the P1 0 heart (A). CMs are iabeied with cardiac troponin T (cTNT) 
(green) and nuciei are iabeied with DAPi (biue). Arrows show some Aurora B- 
positive CMs. Experimentai procedures used for immunohistochemistry 
were essentiaiiy as described (Naqvi et ai., 2014). The Aurora B antibody 
(Abeam, ab 2254), suppiied as a 1 -mg/mi soiution, was used at a diiution of 
1 :50. Sections were prepared and antibodies provided by the Graham iab- 
oratory. immunohistochemicai staining was performed in a biinded manner 
by the Harvey iaboratory. 



planes; it is ideal for nearly 
spherical neonatal CMs, but 
technically challenging when 
CMs become elongated and 
branched during preadoles- 
cence. Moreover, Alkass et al. 
randomly sampled only 1%- 
2% of the heart, which they as- 
sume to be homogeneous. We 
found that CMs undergoing 
mitosis and cytokinesis are pre- 
sent mainly in the subendocar- 
dium (Naqvi et al., 2014), indi- 
cating cellular heterogeneity in 
thyroid hormone response. 
Hence, random assessment of 
a few sites, as in Alkaas et al., 
is likely to increase estimation 
errors. Together with analysis 
of only a few hearts at each 
age, this could produce errors 
that are reflected in a very high 
variance in their CM numbers, 
and thereby obscure biologi- 
cally meaningful increases in 
CM numbers during preadoles- 
cence. 

Alkass et al. and Soonpaa 
et al. were unable to detect mi- 
toses in preadolescent CMs. 
We thus sent unprocessed mu- 
rine heart sections to Richard 
Harvey’s laboratory (Victor 



Cardiomyocytes Replicate 
and their Numbers Increase 
in Young Hearts 



In mice, the heart nearly quadruples in size 
from early preadolescence (postnatal day 
10 [P10]) to puberty (~P35) (Naqvi et al., 
2014). Because it is widely believed that 
mammalian cardiomyocytes (CMs) are 
incapable of replication after birth, it has 
been assumed that early postnatal heart 
growth is driven solely by CM hypertro- 
phy. Our findings question this view and 
provide insights into how thyroid hormone 
may regulate an increase in 
CM number during preadoles- 
cence. Alkass et al. (2015) and 
Soonpaa et al. (2015) present 
evidence that conflicts with 



from a study (Puente et al., 2014) that 
used cell disaggregation/counting pro- 
tocols to determine the murine CM 
population (Figures 4G and 71 of Puente 
et al. [2014], data kindly supplied by 
Hesham Sadek), showed an increase 
of 33% between P7 and P21 (p = 
0.0396), and of 29% between P7 and 
PI 4 (p = 0.029) (Student’s t test, two- 
tailed). 



age when T3 biosynthesis is in- 
hibited (Naqvi et al., 2014). 

Design-based stereology, 
used by Alkass et al., relies on 
accurate delineation of CM 
boundaries over several tissue 
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Chang Cardiac Research Institute). 
Blinded experiments indicated nuclear 
Aurora B labeling (which marks mitotic 
cells) in hearts at P2, P14 (harvested 9 
p.m.), and P15, but not P10 or P18 
(Figure 1A and 1B and data not shown), 
consistent with Naqvi et al. (2014). Soon- 
paa et al. declined our offer of unpro- 
cessed tissue sections from Naqvi et al. 
(2014) for independent validation. Alkass 
et al. dispute that Aurora B^ nuclei 
(Figure 3B in Naqvi et al. [2014]) are in 
CMs. However, we showed unambiguous 
sarcomeric a-myosin heavy chain labeling 
in isolated Aurora B'^ PI 5 CMs (Figure 3G 
in Naqvi et al. [201 4]), which is also evident 
in a very high-resolution micrograph of a 
PI 5 heart section (Figure SI). These data 
confirm mitosis in these CMs. 

Both Alkass et al. and Soonpaa et al. 
invoke a discrepancy between our esti- 
mate of the number of new CMs born at 
~P15 (direct counting) and estimates of 
the percentage of CMs that have under- 
gone a new S-phase (BrdU labeling). Our 
numbers are similar to those independently 
observed by Murray et al. (2015) using a 
protocol like ours and are consistent with 
extensive other data we have published 
(Naqvi et al., 2014). We suggest that 
S-phase CMs are underestimated by the 
methods used by Alkass et al. and Soon- 
paa et al. We are not surprised that a 
BrdU pulse on PI 4 labels only ~3%-5% 
of CMs (Naqvi et al., 2014; Murray et al., 
2015); in rodents the time of tracer clear- 
ance from blood ranges from 0.5 to 1 hr, 
short compared with the length of S-phase 
(Duque and Rakic, 2011), and S-phase in 
CMs may not be synchronized. While 
continuous BrdU infusion avoids this limita- 
tion, the anti-proliferative effects of BrdU/ 
EdU could depress CM division, and 
toxicity may increase death of labeled cells. 
Thus the BrdU/EdU labeling index, after 
pulse or continuous administration, is not 
an infallible indicator of the extent of cell 



replication (Duque and Rakic, 2011). CM 
numbers after birth vary strikingly with litter 
size (Bai et al., 1990), and variations in ani- 
mal husbandry, litter size, or gender could 
affect both the timing and time-to-peak of 
environmentally programmed hormonal in- 
fluences on CM numbers; thus we have 
analyzed only male C57BL/6J litters of 
6-7 pups with confirmed birth dates. 

In summary, the findings of Alkass et al. 
and Soonpaa et al. do not refute our multi- 
ple lines of evidence indicating an in- 
crease in the CM population during pread- 
olescence (Naqvi et al., 2014). In support 
of our findings, Wulfsohn et al. (2004), us- 
ing stereology, found a doubling of CM 
numbers in rats from P25-P125, and Mol- 
lova et al. (2013), also using stereology, 
found a 3.4-fold increase in human CM 
numbers from 1-20 years. These data, 
and our evidence for markers of CM mito- 
ses, mitotic figures, acute decreases in 
cell volume consistent with CM replica- 
tion, and BrdU labeling, provide compel- 
ling evidence for CM proliferation during 
preadolescence (Naqvi et al., 2014). 
Finally, Mollova et al. (2013) and Berg- 
mann et al. (2009) found that most human 
CMs are formed during the first 1 0 years of 
life— that is, in preadolescence— which is 
the key finding of Naqvi et al. (2014). 

SUPPLEMENTAL INFORMATION 
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The “gut microbiota” is rapidly becoming 
a common term outside of the halls of sci- 
ence: it has been headlined in the New 
York Times, is the subject of several non- 
fiction books, and is regularly promoted 
on TV (probiotics, anyone...?). Many 
non-scientists may not remember or 
know the technical term microbiota, but 
mention the gut flora to them, and they’ll 
probably know what you’re referring to. 

Every month, new research comes out 
describing yet more findings about the 
gut microbiota or, more broadly, the hu- 
man microbiota— your collection of mi- 
crobes across all body locations. With 
each new study, more intriguing questions 
unfold as scientists try to understand the 
mechanisms involved. In this short TED 
book, Rob Knight from the University of 
Boulder, Colorado, one of the main re- 
searchers involved in the Human Micro- 
biome Project, and Brendan Buhler, an 
award-winning science writer, provide a 
knowledgeable, up-to-date summary of 
how your gut microbiota affects many as- 
pects of your everyday life. “Allergies, 
asthma, obesity, acne: these are just a 
few of the conditions that may be 
caused— and someday cured — by the 
microscopic life inside us. The key is to un- 
derstand how this groundbreaking science 
influences your health, mood, and more,” 
state the authors. And they are right. 

In seven short chapters, we learn a 
myriad of facts about the gut microbiome 
and ourselves. Starting with the first chap- 
ter, we discover how microbial we are; in 
fact, humans are mostly composed of mi- 
crobes. The book then covers how we ac- 
quire our microbiome, its emerging role in 
sickness and in health, how it interacts 
with our brain through the fascinating 



gut-microbiome axis, and how we might 
be able to tailor it to our specific needs. 
The book also touches upon the impor- 
tant topic of antibiotics and their effects 
on the gut microbiota before closing with 
a chapter looking to the future, as envi- 
sioned by the authors. 

Overall, the studies referenced are 
solid, and the often-personal anecdotes 
throughout the book are generally spot- 
on and funny. We learn of the sobering 
arrival of Rob Knight’s daughter into the 
world, how our pets resemble us more 
than we think, and how bacteria could 
help us stop worrying about our waist- 
lines. The reading is easy and enjoyable, 
with simple cartoons illustrating the 
different topics at hand. Two sidebars 
broaden the scope of the book, providing 
us with a “Brief History of Bugs,” which 




covers the days from Antonie van Leeu- 
wenhoek’s first observations of bacteria 
up to when Robert Koch linked them to 
disease with his famous postulates, and 
“The Science (and Art) of Microbiome 
Mapping,” which provides a window into 
how researchers go about decoding the 
genomic content from complex collec- 
tions of microbes. The book also contains 
an addendum on “The American Gut,” an 
ongoing open-source scientific project 
led by Dr. Knight, allowing each of us to 
discover which inhabitants make up our 
own individual microbiota. Thousands of 
donations have already been made and 
sequences are publicly available, allowing 
researchers to map the incredible diver- 
sity of the gut microbiome and identify 
trends that can be further investigated in 
the lab. 

For those already involved in the field of 
human microbiome research, this book 
will not provide anything new. Of particular 
interest, though, is how Rob Knight com- 
pares your gut microbiota to a garden. 
With this analogy, he makes the very 
strong point that microbial ecology hy- 
potheses and testing will be crucial if we 
are to fully understand the extent to which 
our microbiota influences our health. 

For those who wish to learn more about 
this exciting field, this will be an easy first 
step. And for those seeking health advice 
involving their gut microbiota. Dr. Knight 
provides wise advice: check the sources. 
Before making any radical change in your 
lifestyle or believing the overwhelming 
claims about a miracle-microbial solution, 
you should ask yourself, “Who says so, 
and how does he or she know?” This 
book provides helpful hints for an evi- 
dence-based approach to evaluating the 
various health claims out there. 

In short, this enjoyable read summa- 
rizes a rather complex field of science 
with some everyday scenarios and de- 
livers an optimistic, yet not unrealizable, 
vision of the future of medicine. Clearly, 
we have much to learn about ourselves 
as individual microbial ecosystems. 
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In the United States, rare diseases are 
defined as conditions affecting fewer 
than 200,000 individuals at any given 
time. Worldwide, an estimated 300 million 
people are affected by rare diseases, also 
known as “orphan diseases” because 
with few financial incentives for devel- 
oping new treatments, the pharmaceu- 
tical industry did not “adopt” them. In 
his book entitled Orphan, Phillip Reilly 
paints a picture of several rare diseases 
and the quest to find treatments for them. 

Recent years have witnessed a sea 
change in attention to rare diseases due 
to the convergence of regulatory, scienti- 
fic, and societal forces. In 1983, the US 
Congress enacted the Orphan Drug Act 
(ODA), providing a number of incentives 
for companies to work on rare diseases. 
In addition, the blockbuster drug model 
focusing on common, highly profitable 
conditions such as hypertension or hyper- 
cholesterolemia proved to be unsustain- 
able for the pharmaceutical industry. 
Many companies have instead turned to 
rare diseases, especially those that are 
genetic in origin and have a well-charac- 
terized biological mechanism. Finally, 
the technologies necessary to intervene 
in genetic disorders are starting to 
mature, enabling a number of therapeutic 
options. It is amidst this synergistic set of 
circumstances that Reilly’s book arrives, 
describing the advances for some of 
these disorders, the heroes (parents, sci- 
entists, clinicians, companies) behind 
these advances, and the work left to be 
done. 

Reilly brings a broad and unique 
perspective on these disorders to his 
book. With both MD and JD degrees to 
his credit and board certification in Inter- 
nal Medicine and Clinical Genetics, he 
joined the Eunice Kennedy Shriver Center 
for Mental Retardation in the mid-1980s. 
There, he provided primary care for 800 
adults in a state-run institution for people 
with severe neurological conditions. He 



then joined a venture capital firm in 2008 
and has been focusing on starting com- 
panies to treat rare genetic disorders. 
This wide range of experiences allows 
him to move with relative ease between 
describing the impact of these disorders 
on individuals and their families to discus- 
sing the societal and economic impacts of 
decisions surrounding these conditions. 
His work as a physician, geneticist, and 
venture capitalist, together with his 
evident interest in history of medicine, 
contribute to his narrative of the thrilling 
advances in the area of rare diseases 
and how they have touched people’s 
lives. 

Reilly observes, “To build a company 
with the goal of developing a novel, trans- 
formative therapy for an orphan genetic 
disorder one must understand the disor- 
der and its impact on families at a level 
that can never be attained simply by 
reading medical journals.” Accordingly, 
Orphan is full of stories about individuals 
Reilly has gotten to know under various 
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circumstances. The most vivid vignettes 
are those of families affected with rare 
diseases such as phenylketonuria (PKU), 
a disease that has come under control 
with dietary treatments; dystrophic epi- 
dermolysis bullosa, a life-threatening 
skin condition; and X-linked adrenoleuko- 
dystrophy, a neurological disorder pre- 
senting with cognitive decline in elemen- 
tary school years. Since ~50% of the 
people affected by rare diseases are chil- 
dren, many of the vignettes highlight their 
parents. Parents have made enormous 
contributions to the advances in rare dis- 
ease treatment by being advocates, 
lobbyists, and entrepreneurs— but above 
all, courageous caregivers willing to enroll 
their children in phase I or II clinical trials. 

Other heroes in the book are physicians 
and scientists who focus on a rare disor- 
der and make it their career to seek a 
treatment. One distinguished example 
among many is Roscoe Brady, a physi- 
cian scientist and outstanding biochemist 
at the National Institute of Neurological 
Disorders and Stroke (NINDS). Brady 
focused on Gaucher disease, a disorder 
in which the lipid glucocerebroside accu- 
mulates in the body, leading to liver and 
spleen enlargement. Brady and col- 
leagues discovered that the synthesis of 
this lipid was normal, but its clearance 
by a key lysosomal enzyme was impaired 
in people with Gaucher disease. In 1966, 
20 years before the gene encoding this 
enzyme was cloned, Brady proposed 
treating Gaucher patients with enzyme 
replacement, but it was not until 1991 
that the success of such an approach 
was demonstrated in a clinical trial using 
enzyme extracted from placentas. Since 
then, a recombinant protein replacement 
approach has been used effectively to 
treat a number of diseases of lysosomal 
enzyme deficiency. 

Over the last two decades, similar ap- 
proaches have been approved for a num- 
ber of rare diseases, but patients must 
take these medications regularly for the 
rest of their lives. Newer approaches us- 
ing gene therapy are trying to change 
that, and Reilly discusses the ups and 
downs of this research in detail. Gene 
therapy is experiencing a renaissance fu- 
eled by advances in viral vector technol- 
ogy and is being tested in many ongoing 
clinical trials, including for rare diseases 
such as spinal muscular atrophy, the 
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leading genetic cause of infant mortality. 
To date, only one gene therapy product 
(Glybera, a drug to treat a rare lipid disor- 
der) is approved in Europe; none are 
approved in the US. Nonetheless, there 
is great promise for viral-mediated gene 
delivery to turn diseases with high mortal- 
ity into chronic conditions. 

In addition to viral gene delivery, other 
technologies are likely to expand thera- 
peutic possibilities, reduce side effects, 
and improve efficacy in rare diseases. Re- 
illy summarizes these advancements, 
such as exon skipping, RNA interference, 
induced pluripotent stem cells, and gene 
editing. According to a 2013 American 
Biopharmaceutical Research Companies 
report, more than 450 drugs are in devel- 
opment for rare diseases. While this is un- 
doubtedly significant progress, there are 
still scientific, regulatory, and economic 
hurdles to drug development for rare dis- 
eases. One major scientific obstacle re- 
mains the blood-brain barrier, which is 
impenetrable to many small molecules 
and almost all biologicals. Given that 
many rare genetic diseases affect the ner- 
vous system, technologies that can tra- 
verse the blood-brain barrier are a high 
priority. In addition, for many diseases, 
we still do not know how early treatment 
must be initiated to be effective, so 
defining the temporal windows critical 
for treatment is a major challenge. More- 
over, clinical trials may be difficult to 
design for rare diseases due to small pa- 
tient populations, variability of expres- 
sion, and lack of validated endpoints, 
especially for those affecting the central 
nervous system. These hurdles can be 
further complicated by diverse mutations 



in the same disease gene that sometimes 
require different treatments, as exempli- 
fied by recent treatments developed for 
cystic fibrosis patients carrying only 
certain mutations. While we have reason 
to be enthusiastic about the promise of 
mechanism-based treatments, we need 
to be realistic about these challenges so 
that we can overcome them. 

“We are all orphans” is the provocative 
title of Reilly’s final chapter. Copy-number 
variants and whole-exome analysis 
studies have revealed that each one of 
us carries rare changes in our genome 
that could affect our health. For many of 
them, we do not yet know whether there 
is any clinical significance, but some com- 
mon diseases will likely be caused by a 
combination of genetic variants, as we 
are discovering for autism spectrum dis- 
order. Highly penetrant genetic variations 
may contribute to some of the more than 
7,000 diseases that are considered rare 
diseases, affecting an estimated 20-30 
million Americans. The Precision Medi- 
cine Initiative put forth by President Ob- 
ama may therefore have the largest initial 
impact on rare diseases. 

Reilly acknowledges the societal, 
ethical, and financial considerations in 
this era of huge progress in rare disease 
research and treatment. Increased detec- 
tion during gestation and the potential 
necessity of treatment at fetal stages 
raise major ethical issues. As genetic 
diagnosis technology improves, there 
will be increased demand for Masters- 
level trained genetic counselors, who are 
already scarce in our healthcare system. 
As effective therapies become available, 
there will be fewer reasons to deny ge- 



netic diagnostic tests to patients. More- 
over, the high cost of certain approved 
treatments has already attracted atten- 
tion. Some of the newer therapies will 
turn chronic costs into large one-time 
treatment costs. How will such costs be 
reimbursed? What will the impact be in 
the developing world? Currently, it is esti- 
mated that a quarter of the beds in pediat- 
ric hospitals are occupied by patients with 
genetic diseases. As a child neurologist 
investigating some of these rare diseases, 
I suspect that number is underestimated, 
not accounting for many undiagnosed ge- 
netic disorders. 

“This book is about the struggle to 
save the lives of children who, because 
of a roll of the genetic dice, are born 
with any one of the more than several 
thousand rare genetic disorders.” In 
Orphan, Reilly skillfully describes the 
promise and challenges of treating rare 
diseases and encourages public 
discourse about where we can be in the 
next few decades. The book will likely 
engage not only those working on or 
affected by rare diseases, but also stu- 
dents and biomedical scientists broadly. 
Different readers may be drawn to 
different aspects of the book: resilience 
of the individuals and families living with 
these conditions, progress in basic sci- 
ence and technology, or the due dili- 
gence process for a startup company. A 
combination of our resources, participa- 
tion from families, hard work from physi- 
cians and scientists, and the support of 
industry and funders will be needed to 
translate genetic discoveries into treat- 
ments that will improve the lives of those 
touched by rare diseases. 
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Growing up, I believed that academics 
were shielded from the clutter of everyday 
life and had the peace to think, discuss, 
perform the occasional experiment and 
think some more. After starting a research 
lab, however, I discovered there was not 
much peace in academia. The day was 
just not long enough to fit everything in, 
and though a sabbatical leave always 
lurked at the horizon, there never seemed 
to be a good moment to take a break. 

After three hectic years at the helm of 
the Royal Netherlands Academy and a 
particularly productive period for the lab 
(in my virtual absence, as the postdocs 
like to point out), a sabbatical could no 
longer be postponed. With two sons at 
university, I did not need to stay in one 
place for an entire year. My years at the 
Academy made me aware of how pro- 
foundly individual scientists and their sci- 
ence are affected by their environment, so 
my sabbatical became a world tour: six 
weeks each in six countries, interspersed 
by four week stints back in Utrecht, with 
great university cities serving as a home 
base from which to travel to teach and 
talk science. From Melbourne, I visited 
Sidney, Adelaide, and Brisbane, and 
from the Weizmann, I went to Tel Aviv, 
Haifa, and Jerusalem. New York is next, 
then San Diego and Paris, and finally 
Hong Kong. The experience will be a 
unique opportunity to see what drives sci- 
entists today, where science is going, and 
what the local and global challenges are. 



My “Big Year” 






Stuart Firestein 

Columbia University 

Four years ago, an exceptional under- 
grad in my lab asked me for a letter of 
reference for a Masters program in the 
History and Philosophy of Science (HPS) 
at Cambridge University. When I read 
about the program, my first thought was, 
“Hell, I’d like to do this.” At the time, I 
was just finishing a book about how our 
ignorance fuels science. Three years later, 
having completed my sentence... er, term 
as department Chair, I contacted HPS 
and asked to join their Masters program 
during my sabbatical. They felt I might 
be “a bit too senior” for that, but sug- 
gested instead that I come as a Visiting 
Scholar, an offer I couldn’t refuse. 

Many scientists may consider philoso- 
phers and historians of science to be like 
birdwatchers, impacting scientists about 
as much as ornithologists affect birds. 
That may be true for benchwork, but a 
key part of our job is shaping the process 
and trajectory of science. Our perspective 
on these matters so crucial to the success 
of science is generally limited to our own 
meager individual experience. At HPS, I 
was exposed to the rigorous scholarship 
and broad view of science historians. I 
learned, among other things, how turn- 
of-the-century physics imbedded itself 
into the philosophical and social thinking 
of the culture. Might biologists today be 
missing a similar opportunity to engage 
citizens? At an age when one would think 
my most affecting experiences were 
behind me, I had the most important 
year of my life at HPS. 



Embracing the Whiteboard 




Leonie Ringrose 

Humboldt University of Beriin 



Last winter, I found myself in the un- 
usual position of having some time on 
my hands. My Junior PI position at IMBA 
Vienna had come to an end, and I was still 
negotiating my senior position at Hum- 
boldt University in Berlin. Ever since a 
brief fling with kinetic mathematical 
modeling during my Ph.D. in the 1990s, 
I’ve been wanting to bring the crisp 
formalism of mathematical modeling to 
the perplexing questions of epigenetics 
but never seemed to find the time to 
pursue this. So when I met Martin Ho- 
ward, a physicist at the John Innes Centre 
in Norwich who uses stochastic modeling 
to study epigenetic memory in plants, I 
was inspired by his approach and 
managed to talk him into hosting me for 
a few months to see if we could adapt 
his models to questions in Drosophila 
epigenetics. 

Once there, I found it wonderfully 
refreshing to have the sole task of working 
on my project. I overcame my initial 
fear of giving group meetings with no 
PowerPoint slides, just a whiteboard. 
The project was a success: the combina- 
tion of Martin’s expertise and interest in 
the topic, the environment of theoretical 
thinkers, combined with my background 
in the details of the biology, turned out 
to be very fruitful. I thoroughly benefitted 
not just from taking a career break but 
also from leaving my comfort zone and 
taking a leap into the unknown. And I 
will certainly be making more use of 
modeling, and of the whiteboard, in the 
future. 
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After studying cancer genetics for 23 
years, I wanted to get a closer look at 
how pharma approaches cancer drug dis- 
covery. To do this, I spent a 9 month 
sabbatical at the Discovery Oncology 
Department at Genentech in San Fran- 
cisco. My first surprise was that the 
research atmosphere there is rather aca- 
demic, with small research groups led by 
accomplished scientists. They were very 
generous and quickly made me feel at 
home. I was especially interested to learn 
how decisions are made about which 
molecules are taken forward in clinical 
development. This led to the second sur- 
prise, namely that, in pharma, it is practi- 
cally a mantra that drugs must have 
demonstrated single agent activity before 
drug combinations are considered. Due 
to the extensive redundancy and feed- 
back loops in the major signaling path- 
ways in cancer, many good drugs may 
be abandoned early on for lack of such 
single agent activity. As one example, 
had the highly successful melanoma 
drug vemurafenib (a BRAF inhibitor) first 
been tested in BRAF mutant colon can- 
cer, the drug would have failed miserably, 
as it only shows activity in this context in 
combination with EGFR inhibition. 

So the take-home lesson from my sab- 
batical is that some perfectly good drugs 
may have been needlessly abandoned 
and might be resurrected from their 
graves with synthetic lethal screens to 
find effective drug combinations. It was 
time well spent, and I came away with 
lots of new ideas to work on! 



We are two biologists, from opposite 
sides of the country and distinct training 
backgrounds, but united in our interest 
in infectious diseases. Fleran’s lab fo- 
cuses on the genetics of Mycobacterium 
tuberculosis pathogenesis, whereas Rus- 
sell’s lab focuses on innate immune re- 
sponses to infection. M. tuberculosis 
infects more than 2 billion people world- 
wide, causing over a million deaths annu- 
ally. There is no effective vaccine, and the 
bacterium is highly contagious, requiring 
special BSL3 procedures to be safely 
studied in the lab. On top of this, tubercu- 
losis (TB) is a complex disease where 
both the pathogen and the host are 
complicit in causing mortality. 

Because TB and immunology are both 
fields that are difficult for outsiders to 
enter, we realized that we would both 
benefit from a “mutual” sabbatical. After 
discussions starting back in 2012, and 
the intervening chaos of Superstorm 
Sandy that disrupted Heran’s lab at 
NYU, we were finally able to arrange for 
Reran to visit Berkeley for a sabbatical 
to coincide with Russell’s “stay-batical” 
this fall/spring. To help facilitate interac- 
tion, we are sharing a single office at Ber- 
keley. Our immediate goals are for Russell 
to get in the lab and learn how to work 
with TB, and for Reran to think more 
about TB from the perspective of the 
host. Our longer-term ambitions are to 
try to bring together our knowledge of 
microbiology and immunology to make 
progress on this difficult and globally 
challenging disease. 
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Pejvakin (PJVK), a protein originally identified in Persian families with sensorineural hearing loss, 
regulates peroxisomal dynamics and the antioxidant defense triggered by noise exposure in hair 
cells and auditory neurons of the inner ear. These findings bring peroxisomes to the forefront of 
noise-induced hearing loss research. 



The ability to respond to auditory cues 
is a highly prized evolutionary innovation 
in vertebrates, providing significant ad- 
vantages for communication and the 
perception of environmental stimuli. The 
mammalian inner ear is a particularly intri- 
cate structure, engineered through natu- 
ral selection for the detection of a wide 
range of sound frequencies and energies. 
Transduction of air pressure waves into 
meaningful sounds relies on a neural 
interface provided by highly specialized 
hair cells in the cochlea that transmit me- 
chanical information to primary auditory 
neurons with remarkable precision for 
subsequent decoding in the auditory cor- 
tex (Figure 1 A). As with all nervous tissue, 
the ability of hair cells to withstand dam- 
age and self-repair is severely limited. In 
particular, their exposure to high-energy 
sounds (>100-120 dB) results in mechan- 
ical trauma leading to irreparable struc- 
tural damage and permanent hearing 
loss. Prolonged exposure to lower thresh- 
olds of high-energy sound (>85 dB) is a 
major cause of oxidative stress in hair 
cells, which may also lead to cell death 
and hearing loss (Wong and Ryan, 
2015). Indeed, noise-induced hearing 
loss (NIHL) is a main cause of auditory 
disability and one of the most prevalent 
occupational hazards (Nelson et al., 
2005). In this issue of Cell, Delmaghani 
et al. (2015) show that Pejvakin (PJVK or 
DFNB59), a protein originally linked to a 
congenital form of hearing loss in Persian 
families (Delmaghani et al., 2006), local- 
izes to the peroxisomes in hair cells and 



auditory neurons and mediates a dynamic 
adaptive reaction involving peroxisomal 
proliferation/fission to buffer harmful oxi- 
dative stress (Delmaghani et al., 2015). 
Furthermore, the authors conduct a suc- 
cessful proof-of-principle gene therapy 
approach to correct hearing loss in Pjvk- 
deficient mice. 

Over the last decade, a handful of 
studies linked mutations in DFNB59 to 
different forms of autosomal recessive 
sensorineural hearing loss with variable 
phenotypic manifestations; however, the 
function of the protein remained elusive 
owing to its lack of well-defined functional 
domains, sorting signals, and limited 
sequence identity with other proteins. 
Earlier studies localized PJvk protein and 
mRNA in afferent auditory neurons and 
hair cells, but it is also ubiquitously ex- 
pressed across major mouse organs (Del- 
maghani et al., 2006; Schwander et al., 
2007). 

In a systematic phenotypic assessment 
of PyV/c-deficient mice, Delmaghani et al. 
discovered that young P'\vk~'~ mice 
develop a progressive form of sensori- 
neural hearing loss caused by their litter- 
mate vocalizations, an otherwise innoc- 
uous stimulus for wild-type pups. This 
exacerbated susceptibility to NIHL under 
normal acoustic conditions was not due 
to gross anatomical or histological abnor- 
malities but rather to a progressive post- 
natal loss of hair cell function and number, 
suggesting that the auditory phenotype 
was a result of a specific defect in the 
cellular adaptation to normal noise levels. 



This phenotype correlated with abnor- 
mally high levels of oxidative stress 
markers in the cochlea and gene expres- 
sion changes consistent with a redox 
imbalance. Pjvk was shown to localize in 
peroxisomes, not only in sensory cells of 
the inner ear, but also in other unrelated 
cell types, where it mediates peroxisomal 
proliferation either autonomously or asso- 
ciated with experimentally induced oxida- 
tive stress. 

Oxidative stress in the inner ear has 
been long associated with noise-induced 
damage, including findings of polymor- 
phisms in peroxisomal enzymes and other 
redox systems linked to NIHL susceptibil- 
ity (Konings et al., 2007; Wong and Ryan, 
2015). In fact, antioxidant therapy has 
been tested with some success in models 
of NIHL and age-related hearing loss (Fe- 
toni et al., 2013; Heman-Ackah et al., 
2010). The study by Delmaghani et al. is 
the first one to directly point at peroxi- 
somal dynamics as a first line of antioxi- 
dant defense against normal noise expo- 
sure, without apparent mitochondrial 
involvement, and opens many unresolved 
questions with important implications for 
the biology of peroxisomes and its rela- 
tionship with other cellular functions. 

One of the major observations of the 
study is that sound exposure upregulates 
Pjvk expression in hair cells, inducing 
the proliferation or fission of pre-existing 
peroxisomes. PJVK is thus proposed 
to enable an adaptive homeostatic pro- 
gram in response to the accumulation 
of reactive oxygen species (ROS) induced 
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Figure 1. The Biology of PJVK and Peroxisome Dynamics 

Sensory cells in the organ of Corti (outer and inner hair cells) carry out mechanotransduction of air pressure waves into electrical signals that are transmitted to 
primary auditory neurons in the spiral ganglion, which relay these signals for the cortical representation of sound. Hair cells and auditory neurons are particularly 
sensitive to reactive oxygen species (ROS) generated by exposure to noise. Upregulation of PJVK engages an adaptive program modulating peroxisome dy- 
namics, in particular the proliferation/fragmentation of peroxisomes. PJVK locates at the surface of peroxisomes, where it may interact with peroxins and other 
proteins that mediate crucial steps during the elongation and fission of peroxisomes (PEX11), as well as proteins that regulate pexophagy (PEX3, ATM, PEX5). 
Thus, PJVK expression contributes to maintain cell viability by increasing the buffering capacity against oxidative stress in part through modulating peroxisome 
function. 



by noise (Figure 1B). Peroxisomes are 
extraordinarily dynamic ER-derived or- 
ganelles that interact with other subcellu- 
lar compartments in the regulation of 
major cellular functions related to the 
metabolic and redox status of the cell 
(Smith and Aitchison, 2013). Peroxisome 
biogenesis is crucial for normal physi- 
ology and its deficiency leads to serious 
conditions of the Zellweger spectrum, 
where hearing loss is a common symptom 
in the context of general neurodegenera- 
tion. 

Although de novo peroxisome biogen- 
esis is not regulated by PJVK, subtler 
biogenesis defects secondary to protein 
trafficking or membrane lipid alterations 
cannot be excluded. It is also not entirely 
clear whether the defects observed in 
peroxisome biology in Pjvk~^~ hair cells 
are due to a disruption in the peroxisomal 
fission machinery or in the turnover of this 
organelle by pexophagy (Figure IB). Per- 
oxisomes can operate as a source and a 
sink of ROS, therefore cellular redox bal- 
ance depends on a delicate equilibrium 
between their biogenesis, proliferation, 
and turnover. A recent report linked the 
overproduction of ROS to increased pex- 
ophagy in a process regulated by the 
ataxia telangiectasia mutated (ATM) ki- 
nase and the peroxisome importer recep- 
tor PEX5 (Zhang et al., 2015). This 



pathway might also be compromised in 
Pjvk~^~ hair cells, leading to the appear- 
ance of dysfunctional peroxisomes. The 
presence of cell-type-specific protein 
partners may also shed light on the essen- 
tial function of PJVK in the inner ear. 

In addition to NIHL, the most common 
form of auditory impairment is age-related 
hearing loss (Wong and Ryan, 2015). Ag- 
ing is a process characterized by a gener- 
alized decline in antioxidant defenses and 
PJVK might represent a possible thera- 
peutic target on this front, provided that 
its activity turns out to be sufficient for 
protecting the sensory epithelium from 
age-related oxidative damage. Gene ther- 
apy approaches using adeno-associated 
viruses to restore or enhance PJVK func- 
tion in the cochlea represent an attractive 
strategy to treat highly prevalent forms of 
hearing loss in the near future. 
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Germline stem cells divide asymmetrically, producing a self-renewing stem cell and a differentiating 
progenitor. Xie et al. now show that this depends on two asymmetric events that together partition a 
genome copy, carrying the old histones to the stem cell daughter and a copy with new, unmarked 
histones to the differentiating daughter. 



Post-translational modifications of the 
histones around which genomic DNA is 
wrapped are involved in virtually every- 
thing that the genome does, from the 
various steps of transcription to chro- 
matin replication or DNA damage repair. 
Are the histone modifications instructive, 
determining these events? Or are they 
just facilitators of processes initiated by 
extracellular signals? A remarkable story 
from the lab of Xin Chen, the latest install- 
ment of which appears in this issue of Cell 
(Xie et al., 2015), sheds more light on this 
question and shows that histone modifi- 
cations can also determine how chromo- 
somes are partitioned between daughter 
cells destined to have different fates dur- 
ing asymmetric cell division. 

Chen and co-workers looked at the 
germline stem cells (GSCs) in the 
Drosophila testes. These stem cells main- 
tain contact with the “hub,” a cluster of 
niche cells that provide important signals 
for “sternness.” The GSCs divide asym- 
metrically to produce a daughter GSC 
that remains in contact with the hub and a 
gonialblast (GB) daughter cell that moves 
away and initiates differentiation, ultimately 
to produce spermatocytes. In a remark- 
able 2012 article, Chen and co-workers 
reported that parental histone H3 and 
presumably the entire H3-H4 heterote- 
tramer segregates asymmetrically during 
GSC division so that the daughter cell 
that retains stem cell fate inherits the 
old histone H3 epigenetically marked by 
GSC-associated post-translational modifi- 
cations, while the GB daughter cell ac- 
quires newly synthesized and unmodified 
H3 (Tran et al., 2012). 

In normal symmetric cell division, the 
old histones are removed from the DNA 
just in front of the advancing replication 
fork and are restored to the DNA on the 



other side of the fork approximately in 
equal measure to the two daughter DNA 
molecules. Newly synthesized histones 
fill in the gaps resulting from the 2-fold 
dilution of the old histones. How the re- 
deposition of old histones takes place is 
poorly understood, but how it is rendered 
asymmetric in the GSC so that one 
daughter molecule is the preferred recip- 
ient is not understood at all. It may be 
related to the asymmetry in DNA replica- 
tion, where one strand is replicated by 
continuous elongation (leading-strand) 
while the other (lagging-strand) is repli- 
cated discontinuously in short segments. 
However, this is insufficient to account 
for the asymmetry since the genome is 
normally replicated bidirectionally from 
many origins. Another possibility sug- 
gested by the authors is that the two 
DNA strands are distinguished according 
to which is the oldest (grandparent) 
strand. Differential inheritance of the two 
parental DNA strands has been seen in 
some cases of asymmetric stem cell divi- 
sion such as mouse muscle stem cells or 
the Drosophila male GSC itself (Evano 
and Tajbakhsh, 2013; Yadlapalli and Ya- 
mashita, 2013) and has been argued to 
preserve to the stem cell the DNA strand 
with the fewest replication-induced muta- 
tions (the immortal-strand hypothesis, 
proposed by Cairns, 1975; reviewed by 
Rando, 2007). No mechanism for the 
distinction of the two strands is currently 
known. Regardless of how it is produced, 
the asymmetric distribution of the old his- 
tones suggests that their stem-cell-spe- 
cific modifications enforce GSC fate while 
the unmodified histones in the GB 
daughter leave the cell free to acquire a 
differentiation program. 

In the present paper, however, the au- 
thors tackle a second aspect of the asym- 



metric division. Cnee the asymmetric 
old histone distribution is achieved on 
the replicated DNA molecules, their 
partitioning needs to be orchestrated 
such that all the chromosomes with 
the old histones go to the GSC daughter 
and all chromosomes with new histones 
go to the GB daughter cell. To study 
how this is achieved, Xie et al. used a 
two-color system previously developed 
in the Chen lab (Tran et al., 2012) in which 
a transgene expresses histone H3 labeled 
with GFP (old histone) but can be 
switched by a heat shock to express H3 
labeled with mKC (new histone). In the 
mitosis that follows the administration of 
the heat shock, mKC was found to deco- 
rate preferentially the chromosomes that 
segregate into the GB while the old 
GFP-H3 was found on the GSC chromo- 
somes. The correct alignment of chromo- 
somes at metaphase is known to involve 
the Haspin kinase (Dai et al., 2005), which 
normally phosphorylates the threonine at 
position 3 of histone H3 in late mitosis. 
Xie et al. found that, in GSCs, Haspin pro- 
duces an early phosphorylation event that 
is specific for the chromatids containing 
old histones. This earlier event apparently 
gives the “old” chromosomes the advan- 
tage in connecting with the mother 
centrosome and thus in segregating 
with the cell that retains GSC identity. 
The mother centrosome is connected to 
the surface of the GSC that contacts the 
hub, and therefore the mitotic spindle is 
oriented perpendicular to this surface 
and causes the GSC daughter to remain 
in contact with the hub while the GB 
daughter moves away (Yamashita et al., 
2007). Signaling from the hub cells is 
important. For example, hyperactivation 
of JAK-STAT signaling in the hub cells 
causes GSCs to divide symmetrically. 
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and Is Linked to Cell Fate 

(Top) Germline stem cells (GSCs) associated with the hub cells divide asymmetrically such that one 
daughter cell remains a GSC and associated with the hub, while the other becomes a gonialblast (GB), 
initiates to differentiate, divides symmetrically, and eventually produces spermatocytes. (Bottom) The 
asymmetric features begin during DNA replication in the GSC. The old histones, presumably with GSC- 
specific histone modifications, are specifically transferred to one of the daughter DNA molecules (not 
known which), while newly synthesized histones lacking modifications are deposited on the other DNA 
molecule. At the beginning of mitosis, an early wave of recruitment of the Haspin kinase phosphorylates 
histone H3T3 on old histones, which directs attachment to the maternal centrosome. As a result, at 
telophase, the chromosomes carrying the old histones segregate together to the daughter cell that re- 
mains associated with the hub and retains GSC character. The daughter cell with new histones now re- 
sponds to signals that initiate differentiation. 



producing two GSC daughters, each with 
half of the old histone H3. 

To demonstrate the role of H3T3 and its 
importance for asymmetric segregation, 
Xie et al. antagonized endogenous H3 
by expressing a transgenic H3 in which 
T3 is mutated to alanine (H3T3A), which 
cannot be phosphorylated, or to aspar- 
tate (H3T3D), which mimics constitutive 
phosphorylation. H3T3A expression in 
early germ cells greatly reduces the 
level of phosphorylated H3T3 in dividing 
GSCs. As a consequence, the pattern 
of old histone segregation becomes 
predominantly symmetric: the dividing 
GSC can no longer distinguish which 
chromosomes have the old histones and 
which have the new. Interestingly, since 
Drosophila has only three large and one 
very small chromosome, random segre- 
gation would occasionally result in the 
correct asymmetric segregation pattern: 
all of the “old” chromosomes going with 
the daughter GSC by chance. With a 
similar low frequency, the inverse asym- 



metry would be also expected: all of the 
“old” chromosomes going with the GB 
daughter. Remarkably, this is just what is 
observed and with a frequency close to 
that statistically expected. The expres- 
sion of the H3T3D transgene in early 
germ cells leads to incorporation of the 
phospho-mimic H3T3D in both copies of 
the DNA during replication, and therefore 
both compete equally in segregation to 
the daughter cells. Just as with the 
H3T3A transgene, this results in random- 
ized inheritance of the “old” chromo- 
somes when the GSCs divide (Figure 1). 

Even more telling are the conse- 
quences of inappropriate segregation of 
the chromosomes carrying the old his- 
tones. In both H3T3A and H3T3D experi- 
ments, cells containing a spectrosome, 
an organelle characteristic of GSCs, 
decrease in number. In their place, next 
to the hub region, appear cells containing 
a fusome, typical of differentiating germ 
cells, suggesting that GSCs lose their 
stem cell character and begin to differen- 



tiate. In addition, the authors observe het- 
erogeneous “tumors”: clusters of cells 
that replicate actively and exhibit a 
mixture of GSC-like and GB-like features. 
This suggests that the daughter cells ex- 
press their differentiation genes variously 
in proportion to the number of “new” 
chromosomes that they received. 

To summarize, then, the old histone H3 
with its various marks is preferentially 
transmitted to one genome copy. This in- 
structs Haspin to phosphorylate preferen- 
tially the “old” chromosomes, which in 
turn are preferentially associated with 
the maternal centrosome. The cell with 
the “old” chromatin retains stem cell 
character and stem-cell-specific gene 
expression, while the cell with “new” 
chromatin is free to respond to a new pro- 
gram of gene expression leading to differ- 
entiation. At least in these cells then, sig- 
nals from the hub cells are not sufficient 
to determine “sternness,” and the histone 
modifications are both necessary and suf- 
ficient to prevent differentiation. These re- 
sults raise many fascinating questions. 
How is Haspin activated for early H3T3 
phosphorylation in GSCs? How does it 
recognize the old histone H3? What 
“reads” the phosphorylated H3T3? Most 
intriguing of all, how is the asymmetric 
distribution of old histones achieved dur- 
ing DNA replication? It is clear that new 
surprises and insights will be forthcoming 
as Chen and co-workers continue their 
study of this process. 
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A specialized subset of epithelial cells in the thymus “promiscuously” transcribes thousands of 
peripheral genes to ensure that developing T cells can test their antigen receptors for dangerous 
autoreactivity. New findings by Takaba et al. indicate that the transcription factor Fezf2 acts inde- 
pendently of Aire in thymic epithelial cells to generate “genetic noise” for immunological tolerance. 



During T cell differentiation in the thymus, 
every T cell is equipped with a unique 
antigen receptor generated through so- 
matic rearrangements. Whereas this vast 
T cell receptor repertoire confers protec- 
tion against a universe of constantly 
evolving pathogens, the random nature 
of the rearrangement process inevitably 
leads to the emergence of potentially 
dangerous T cells that recognize the 
body’s own structures. Prior to being 
released into the blood circulation, imma- 
ture T cells must therefore “test” their 
antigen specificity on ligands within the 
thymic microenvironment and will un- 
dergo negative selection against autor- 
eactivity. The scope of self-antigens that 
are visible to T cells for central tolerance 
is substantially broadened through the 
constitutive expression of a plethora of 
tissue restricted antigens (TRAs) by med- 
ullary thymic epithelial cells (mTECs) 
(Derbinski et al., 2001; Klein et al., 2014). 
This phenomenon is referred to as “pro- 
miscuous gene expression,” and it has 
evolved to facilitate T cell tolerance to- 
ward self-antigens that would otherwise 
be temporally or spatially secluded from 
the immune system. In this issue of Cell, 
Takaba et al. (2015) identify the transcrip- 
tion factor Fezf2 as a key driver of promis- 
cuous gene expression in mTECs, essen- 
tial to prevent spontaneous autoimmunity 
against multiple tissues (Takaba et al., 
2015). 

Tightly controlled expression of cell- 
type-specific genetic programs is indis- 
pensable for tissue identity and homeo- 
stasis, and multiple layers of control act 
in concert to prevent ectopic gene 
expression. How do mTECs deliberately 
violate these rules in order to generate 
“beneficial genetic noise”? A break- 
through in the field emerged from studies 



on the autoimmune polyendocrine syn- 
drome (APS), a monogenically inherited 
human autoimmune disease resulting 
from mutations in the autoimmune regu- 
lator (Aire) gene. In mice, Aire was found 
to be essential for “ectopic” TRA ex- 
pression in mTECs, and knockout of 
Aire recapitulated several aspects of 
APS (Anderson et al., 2002). Strikingly, 
however, it turned out that promiscuous 
gene expression was not fully abolished 
in the absence of Aire (Derbinski et al., 
2005), suggesting the existence of addi- 
tional transcriptional regulators. Still, 
most research in the field focused on 
deciphering the molecular workings of 
Aire (Mathis and Benoist, 2009; Peterson 
et al., 2008), while largely ignoring how 
Aire-independent promiscuous gene ex- 
pression is regulated. Here, Takaba et al. 
(201 5) hypothesize that any Aire-indepen- 
dent additional regulators of promiscuous 
gene expression should be differentially 
expressed between mTECs and their 
nearest neighbor, the cortical thymic 
epithelial cells (cTECs), who share a com- 
mon precursor with mTECs but do not ex- 
press TRAs. Among the genes that were 
differentially expressed in mTECs versus 
cTECs, they find the transcriptional re- 
gulator Fezf2 (forebrain expressed zinc 
finger 2) to be expressed in mTECs, but 
not in other thymic stromal cell types. Pre- 
vious work has implicated Fezf2 in corti- 
cospinal motor neuron differentiation, 
and Fezf2-deficient mice do not survive 
beyond weaning, explaining why a poten- 
tial role of Fezf2 in the immune system 
may have gone unnoticed. By comparing 
gene expression profiles, Takaba et al. 
(2015) demonstrate that numerous TRA 
transcripts are downregulated in Fezf2- 
deficient mTECs. Importantly, the major- 
ity of these Fezf2-dependent TRAs are 



not affected in Aire-deficient mTECs, 
and most Aire-dependent TRAs remain 
expressed in the absence of Fezf2, 
strongly supporting an Aire-independent 
and non-redundant role of Fezf2 in the 
promotion of promiscuous gene expres- 
sion. Along these lines, the authors go 
on to show that Fezf2 deficiency in thymic 
epithelium elicits a spectrum of autoim- 
mune manifestations that is partly distinct 
from those seen in Aire-deficient mice. 

Remarkably, several aspects of Fezf2’s 
role in the thymus seem to follow a funda- 
mentally different biological and mecha- 
nistic logic as compared to what is known 
about Aire (Figure 1). First, Aire’s function 
as a “transcriptional randomizer” is likely 
to have evolved simultaneous to, and 
as a consequence of, the emergence 
of adaptive T-cell-mediated immunity in 
jawed vertebrates some 500 million years 
ago (Saltis et al., 2008). By contrast, 
Fezf2’s evolutionary conserved primary 
function seems to be that of a master 
regulator of cell fate specification in corti- 
cospinal motor neurons. Thus, Aire ap- 
pears to be a genuine “tolerance gene,” 
whereas Fezf2 instead might exemplify 
the evolutionary co-optation of a neuronal 
gene in a distinct cellular and functional 
context. Second, Aire lacks a DNA-bind- 
ing domain and seems to seek out its tar- 
gets through binding inactive chromatin 
marks prior to recruiting factors that facil- 
itate “illegitimate” transcription by gener- 
ating double-strand breaks, fostering 
mRNA processing, and releasing stalled 
RNA Polymerase (Mathis and Benoist, 
2009; Peterson et al., 2008). By contrast, 
Fezf2 is a “bona fide” transcription factor 
that directly binds to target DMA. In 
neuronal progenitors, Fezf2 binds in the 
vicinity of the transcriptional start site of 
more than 10,000 genes (Lodato et al.. 
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Figure 1 . Fezf2 and Aire Promote Promiscuous Gene Expression through Distinct Pathways 

Takaba et al. (2015) show that Fezf2 is induced downstream of iymphotoxin p receptor (LTpR) stimuiation 
in mTECs and generates “transcriptionai noise” by directiy binding to its target genes. By contrast, 
previous work has indicated that Aire is activated by RANK/CD40 signaiing. Aire binds to inactive chro- 
matin marks (unmethyiated H3K4) in the vicinity of tissue restricted antigen (TRA) genes and indirectiy 
promotes promiscuous gene expression through recruitment of factors that promote opening of chro- 
matin, reiease of staiied RNA poiymerase, and faciiitation of mRNA maturation. Fezf2- and Aire-induced 
TRAs appear to be iargeiy non-overiapping, resuiting in a non-redundant and compiementary roie of both 
factors in centrai T ceii toierance. 



2014), which has obvious implications for 
the spectrum of genes that might be 
controlled by Fezf2 in mTECs. Finally, 
whereas Aire expression in mTECs is 
regulated by the TNF superfamily mem- 
bers RANK and CD40, Takaba et al. 
(2015) show that the expression of Fezf2 
critically depends upon the Iymphotoxin 
(LT) p signaling axis. 

Taken together, the work by Takaba 
et al. (2015) represents a major step for- 
ward in our understanding of how promis- 
cuous gene expression in mTECs is 



brought about and how it safeguards 
against autoimmunity. At the same time, 
these exciting new insights raise a num- 
ber of questions: do the altered 3D organi- 
zation of the thymic medulla and subtle 
alterations in the ratio of mTEC subsets 
in Fezf2~'~ mice contribute to faulty T cell 
selection independent of Fezf2’s role in 
promiscuous gene expression? Given 
the perplexing observation that individual 
mTECs only express a subset of Aire- 
dependent TRAs, does the same apply 
to Fezf2-dependent transcripts? Does 



Fezf2 similarly promote promiscuous 
gene expression in the human thymus, 
and if so, are mutations or allelic variants 
of Fezf2 associated with human autoim- 
mune diseases? And finally, does the 
complementary action of Aire and Fezf2 
account for the full extent of promiscuous 
gene expression in mTECs, or are as-yet- 
unknown, additional, and independent 
factors involved? 
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Two studies by Meyer et al. and Wang et al. demonstrate a role for m®A modification of mRNA in 
stimulating translation initiation. These findings add to the growing number of diverse mechanisms 
for translation initiation in eukaryotes. 



The control of translation initiation is a crit- 
ical aspect of modulating protein produc- 
tion, particularly when rapid responses to 
extracellular cues are required, such as 
during neuronal stimulation or stress con- 
ditions (Sonenberg and Hinnebusch, 
2007). Translation initiation requires the 
delivery of the small 40S ribosomal sub- 
unit to the mRNA. In eukaryotes, this is 
primarily achieved in a mechanism that 
begins with binding of the 5' mRNA cap 
by the elF4F complex, which recruits the 
40S subunit pre-bound to a multifactor 
complex, including elF3, elF2, and the 
initiator tRNA (Figure 1A). The ribosome 
then scans along the 5' UTR to the AUG 
start codon, followed by joining of the 
large ribosomal subunit, producing a 
translation competent complex. In a sec- 
ond mechanism, specific mRNA struc- 
tures referred to as internal ribosome 
entry sites (IRES) can recruit the 40S sub- 
unit either by binding to one of the initia- 
tion factors, which then recruits the 40S 
subunit, or by direct interaction with the 
40S subunit, as in the case of CrPV IRES 
(Figure IB) (Sonenberg and Hinnebusch, 
2007). Two papers in this issue of Cell 
(Meyer et al., 2015; Wang et al., 2015) 
and a third study (Zhou et al., 2015) 
now argue that m®A modifications in 
mRNA can promote translation initiation 
and suggest two possible mechanisms 
by which such RNA modifications can 
lead to ribosome recruitment (Figures 1C 
and ID). 

Convincing evidence that m®A modifi- 
cations can stimulate translation comes 
from the observations that uncapped 
m®A-containing mRNAs are much more 
efficiently translated in cell-free extracts 
than unmodified mRNAs, and m®A-modi- 
fied mRNAs assemble translation initia- 
tion complexes in reconstituted systems 



in the absence of the elF4F complex, 
unlike unmodified mRNAs (Meyer et al., 
2015). Strikingly, a single m®A in the 5' 
UTR is sufficient to boost cap-indepen- 
dent translation both in extracts and 
when mRNAs are introduced into cells 
by transfection (Meyer et al., 2015; Zhou 
et al., 2015). Evidence that m®A modifica- 
tions promote translation in vivo is that 
depletion of the METTL3 m®A methyl- 
transferase reduces ribosome occupancy 
for mRNAs with 5' UTR m®A modification 
sites (Meyer et al., 2015), and on mRNAs 
that are bound by YTHDF1 , an m®A-bind- 
ing protein (Wang et al., 2015). Moreover, 
for the Hsp70 mRNA, the extent of m®A 
modification corresponds to the rate of 
protein production and polysome occu- 
pancy during heat shock (Meyer et al., 
2015; Zhou et al., 2015). Finally, trans- 
fected mRNAs with a cap unable to stim- 
ulate translation are effectively translated 
under stress conditions if they contain 
a 5' UTR m®A modification (Zhou et al., 
2015). 

Meyer et al. (201 5) provide three obser- 
vations that m®A stimulates cap-indepen- 
dent translation through interactions with 
elF3, thereby leading to ribosome recruit- 
ment (Figure 1C). First, in a reconstituted 
system, elF3 preferentially cross-links 
to RNA with m®A modifications. Second, 
in vivo, elF3-binding sites defined by 
cross-linking significantly overlap with 
m®A modification sites in 5' UTRs. Third, 
overexpression of the FTO demethylating 
enzyme reduces the association of 5' UTR 
m®A-modified mRNAs with elF3. Interest- 
ingly, the authors demonstrate that elF3 
prefers to bind m®A-modified mRNA 
when the modification is within the 
expected GAC sequence context. This 
may correlate with the observation that 
m®A is not able to stimulate translation in 



all 5' UTRs, demonstrating the impor- 
tance of context (Zhou et al., 201 5). How- 
ever, whether this observation is due to 
differences in elF3 interactions has not 
been determined. 

In contrast, several observations lead 
Wang et al. (2015) to suggest that m®A 
modifications in the 3' UTR, and possibly 
the coding region, may enhance transla- 
tion by binding the C-terminal domain of 
the YTHDF1 m®A-binding protein, which 
then recruits the translation initiation com- 
plex through its N-terminal domain 
(Figure ID). First, knockdown of YTHDF1 
leads to reduced ribosome occupancy 
on mRNAs bound by YTHDF1. Second, 
tethering the N-terminal domain of 
YTHDF1 to an mRNA leads to some in- 
crease in translation. Finally, YTHDF1 
co-purifies with a large number of pro- 
teins, including elF3 in a RNase-resistant 
manner, suggesting that the interaction 
with elF3 allows YTHDF1 to promote 
translation of m®A modified mRNA 
(Wang et al., 2015). Interestingly, Meyer 
et al. (201 5) do not see changes in transla- 
tion profiles in YTHDF1 knockdown cells 
when examining 5' UTR, 3' UTR, or all 
m®A-modified mRNAs, suggesting that 
YTHDF1 effect on translation would be 
limited to a subset of m®A-modified 
mRNAs. 

A number of questions remain. Do 
these two proposed mechanisms for 
m®A stimulation of translation cooperate 
or compete in different contexts? How 
does the growing number of m®A-binding 
proteins (YTHDF1, YTHDF2, elF3, etc.) 
recognize specific binding sites? elF3 in- 
teracts preferentially with m®A modifica- 
tions found in the 5' UTR, but these are 
a minority of such modifications in 
the transcriptome. What other protein fac- 
tors or local mRNA features define an 
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Figure 1. Mechanisms of Translation Initia- 
tion in Eukaryotes 

(A) Cap-dependent translation initiation. elF4F 
complex binds the 5' cap of mRNA and then re- 
cruits the 40S ribosomal subunit pre-bound to a 
multifactor complex, including elF3, elF2, and the 
initiator tRNA, to start translation initiation. 

(B) IRES-stimulated translation initiation. Some 
mRNAs contain specific IRES structures that re- 
cruit the 40S subunit either indirectly by binding to 
one of the initiation factors. 

(C) 5' UTR m®A-mediated translation initiation. 
Translation initiation is stimulated by m®A modifi- 
cation of mRNA 5' UTR via direct recruitment of elF3. 

(D) YTHDF1 -mediated translation initiation. Trans- 
lation initiation is stimulated by m®A modification of 
the 3' UTR of mRNA through recruitment of 



elF3-binding site to prevent binding in 
other regions of the mRNA? How is the 
competition between m®A-binding fac- 
tors properly balanced? Finally, since 
methylation is reversible, like many chro- 
matin modifications, it will be important 
to determine the mechanisms regulating 
the rates of methylation and demethyla- 
tion of specific sites. 

A broader point from these papers is 
that eukaryotic cells contain a growing 
diversity of mechanisms for translation 
initiation, which has implications for our 
understanding of the predicted prote- 
ome. In addition to canonical cap- 
dependent translation, IRES, and now 
m®A modification-stimulated initiation, 
other mechanisms exist (Figures 1 E and 
1F). For example, ribosome shunting in- 
volves the translocation of 40S ribo- 
somes from the cap region to internal 
sites for initiation, which can lead to the 
use of internal AUGs, and/or the skipping 
of 5' UTR RNA structures that would 
otherwise inhibit translation (Figure 1E) 
(Chappell et al., 2006). A mechanism by 
which ribosomes might be recruited to 
mRNAs independent of the cap is sug- 
gested by the binding of elF3 to stem 
loops in specific 5' UTRs, which can 
result in either stimulation or inhibition 
of translation (Lee et al., 2015). Ribo- 
some profiling studies have also identi- 
fied translation initiation sites at near- 
cognate start codons, suggesting that 
the start site, as well as the initiation 
complex, is malleable (de Klerk and 
’t Hoen, 2015). A striking example of an 
unexpected mode of translation is seen 
in the case of repeat-associated non- 
AUG (RAN) translation, which occurs at 
disease-associated CAG repeats (Fig- 
ure IF) (Zu et al., 2011). Although the 
mechanism of RAN translation is not 
known, a reasonable prediction is that 
cells use that same non-AUG-dependent 
mode of translation in some context. 

One has to anticipate that cells use 
additional yet-to-be-discovered mecha- 



YTHDF1, which subsequently recruits the trans- 
lation initiation complex. 

(E) Ribosome shunting. Ribosomal RNA base pairs 
with mRNA leading to the translocation of 40S 
subunit from the cap region to internal start co- 
dons for initiation. 

(F) Repeat-associated non-AUG (RAN) translation. 
Translation initiation can occur at disease-associ- 
ated CAG repeats. 



nisms to recruit ribosomes to mRNAs. 
For example, is has been suggested that 
some mRNAs recruit eukaryotic ribo- 
somes by direct base pairing to rRNAs, 
similar to the bacterial mechanism of 
initiation in which the Shine-Dalgarno 
sequence 5' of the start codon base pairs 
to the small ribosomal subunit (Deforges 
et al., 2015). Moreover, one speculates 
that evolution is likely to have chanced 
upon sequence-specific RNA-binding 
proteins that interact with elF3 or other 
initiation factors to recruit the 40S subunit 
in a cap-independent manner. Finally, 
it remains possible that other mRNA 
base modifications will also stimulate 
translation initiation in some context. 

The growing diversity of translation 
initiation mechanisms allows the cell to 
preferentially control the translating popu- 
lation of mRNAs under different condi- 
tions. For example, cap-dependent trans- 
lation is inhibited when the TOR pathway 
is inactive, such as under nutrient depri- 
vation or stress. However, to survive 
such conditions, the cell must produce 
stress-response proteins, which can be 
done by utilizing cap-independent mech- 
anisms of initiation. Consistent with this 
view, Meyer et al. (2015) observe that 
Hsp70 translation is stimulated via m®A 
during heat shock, when cap-dependent 
translation is inhibited. They also analyze 
m®A modification across the genome 
under heat and UV stress and find that 
m®A modifications specifically increase 
in the 5' UTR during stress. The increase 
in m®A 5' UTR modifications during heat 
shock may be due to nuclear import of 
YTHDF2 during heat stress, which allows 
it to compete with the demethylase 
FTO (Zhou et al., 2015). Importantly, 
many known variations of translation 
initiation have been identified under 
conditions considered non-standard, 
such as during development or under 
stress. As shown by Meyer et al. (2015) 
for m®A modifications, these variations 
of translation initiation may be functional 
during normal growth conditions but are 
likely more active during conditions in 
which inhibition of cap-dependent trans- 
lation allows alternative mechanisms 
to be more competitive. Thus, studies 
of translation mechanisms in non-tradi- 
tional cellular conditions may reveal an 
even broader set of translation initiation 
mechanisms. 
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Recent advances in single-cell sequencing hold great potential for exploring biological systems 
with unprecedented resolution. Sequencing the genome of individual cells can reveal somatic mu- 
tations and allows the investigation of clonal dynamics. Single-cell transcriptome sequencing can 
elucidate the cell type composition of a sample. However, single-cell sequencing comes with major 
technical challenges and yields complex data output. In this Primer, we provide an overview of 
available methods and discuss experimental design and single-cell data analysis. We hope that 
these guidelines will enable a growing number of researchers to leverage the power of single-cell 
sequencing. 



Introduction 

Understanding the development and function of an organ re- 
quires the knowledge of its constituents, i.e., of all the different 
cell types the organ is composed of. It is still common practice 
to distinguish cell types based on a small set of marker genes. 
These can be used to isolate sub-populations of cells, e.g., by 
fluorescence-activated cell sorting (FACS), which can then be 
characterized by population-based assays such as next-gener- 
ation sequencing. This approach is inherently constrained, since 
a pre-selection of marker genes limits the resolution and vari- 
ability within a marker-gene-expressing sub-population of cells 
cannot be resolved. Moreover, even cells of the same type can 
show substantial gene expression variability leading to pheno- 
typic variation (Eldar and Elowitz, 2010; Munsky et al., 2012; 
Snijder and Pelkmans, 2011). The ideal approach to profile the 
cell type composition of an organ or to explore transcriptome 
heterogeneity across cells of the same type is a separate anal- 
ysis of individual cells randomly drawn from a sample. Single- 
cell analysis of a small number of genes can be performed with 
imaging-based methods such as single-molecule fluorescence 
in situ hybridization (Raj et al., 2008) or by flow cytometry, ex- 
ploiting cell surface markers or fluorescent reporter proteins. 
Single-cell transcriptome analysis, on the other hand, is an 
experimental approach to obtain an unbiased view of all mRNAs 
present in a cell. Already by 1992 the expression of selected 
genes in individual neurons had been quantified by Southern 
blotting after amplifying the entire pool of mRNAs from a cell 
(Eberwine et al., 1992). Single-cell transcriptome sequencing 
had initially been applied by the Surani laboratory in 2009 
(Tang et al., 2009). Over the last five years, a number of single- 
cell mRNA-sequencing methods with improved sensitivity and 
reduced technical noise have been introduced (Hashimshony 
et al., 201 2; Islam et al., 201 1 , 201 4; Picelli et al., 201 3; Ramskold 
et al., 2012; Sasagawa et al., 2013). These methods have been 
used to discriminate cell types in healthy tissues (Jaitin et al., 



2014; Zeisel et al., 2015), to study differentiation dynamics 
(Treutlein et al., 2014), to discover rare cell types (Grun et al., 
2015), to investigate the transcriptome response upon external 
signals (Shalek et al., 2013, 2014), or to profile tumor heteroge- 
neity (Patel et al., 2014). 

The genotypic variation that underlies cell-to-cell differences 
can be explored by single-cell genomics. In a landmark study, 
sequencing of the genomic DNA from single-tumor cell nuclei 
was employed to profile chromosome copy numbers in order 
to elucidate clonal expansion and tumor evolution (Navin 
et al., 2011). Subsequently, a number of improved methods 
have been published permitting the detection of genomic copy 
number variations and other structural rearrangements with 
increasing spatial resolution (Falconer et al., 2012; Gole et al., 
2013; Wang et al., 2012; Zong et al., 2012). 

In this Primer, we give an overview of the available techniques 
for genome and transcriptome sequencing, discuss the specific 
aspects and limitations of each method, and propose guidelines 
for designing single-cell sequencing experiments. Since any sin- 
gle-cell sequencing technique is based on amplification of min- 
ute amounts of material leading to substantial technical noise 
(Brennecke et al., 2013; Grun et al., 2014), data processing 
and analysis require extra care. We will discuss in depth all 
necessary steps for data acquisition, filtering, and analysis, 
with a focus on single-cell transcriptomics. 

Isolating Single Cells for Sequencing 

To perform any kind of single-cell sequencing assay, individual 
cells first have to be isolated from the system of interest. The 
method of choice to purify thousands of single cells is FACS. 
With unrestricted sorting gates, random samples of cells can 
be purified. Alternatively, sorting gates can be set based on 
scatter properties reflecting the morphology and composition 
of a cell. Fluorescently labeled antibodies against cell surface 
markers provide another strategy to purify sub-groups of cells. 
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Current technology permits the simultaneous measurement of 
up to 20 parameters per cell and thus highly specific sub-groups 
of cells can be isolated by FACS (Chattopadhyay and Roederer, 
2012). These can be sorted directly into 96- or 384-well plates 
amenable to subsequent single-cell sequencing. Importantly, 
the parameter information can be allocated to each well. How- 
ever, flow cytometry requires a large starting volume, and sorting 
errors can lead to wells with cell doublets or empty wells. 

Micromanipulation provides an alternative approach when 
only a few cells are available and visual inspection of a cell is 
desired prior to sequencing. Here, cells are aspirated with a 
glass micropipette under the microscope. However, this method 
is very laborious and not well suited to high-throughput single- 
cell analysis. 

More recently, microfluidic devices became available that 
enable sorting single cells into individual compartments where 
cells can be visually monitored and further processed. This Fluid- 
igm Cl autoprep system is particularly suited to single-cell 
sequencing (Islam et al., 201 4; Pollen et al., 201 4). A shortcoming 
of this method is the fixed chip architecture that limits the selec- 
tion of cells to a certain size window. A more detailed discussion 
of single-cell isolation methods has been published recently 
(Saliba et al., 2014). 

Comparison of Whole-Genome Amplification 
Techniques 

Being able to sequence the genome of individual cells permits 
the investigation of many relevant questions. Over the lifetime 
of an organism, cells undergo multiple rounds of division. During 
each cell division, DMA replication errors can escape the DMA 
repair machinery with a small probability and can lead to 
so-called somatic mutations, which can give rise to cancer (Alex- 
androv and Stratton, 2014) and other diseases (Biesecker and 
Spinner, 2013). Moreover, a surprisingly high frequency of chro- 
mosomal abnormalities has been observed during mammalian 
germline development (Nagaoka et al., 2012). All types of germ- 
line and somatic genome mutations, comprising substitutions, 
insertions and deletions (indels), copy number variations (CNV) 
and structural rearrangements, can in principle be detected by 
DMA sequencing. Moreover, genetic inheritance can be studied 
by quantifying maternal and paternal allele frequencies based 
on single-nucleotide polymorphisms (SNPs). However, a single 
mammalian cell contains less than 10 pg of DMA, necessitating 
whole-genome amplification (WGA) prior to sequencing or micro- 
array-based analysis. Currently available WGA principles are 
based on polymerase chain reaction (PCR), multiple displace- 
ment amplification (MDA), or a combination of the two. PCR- 
based strategies initiate amplification by either priming with 
random oligonucleotides (Cheung and Nelson, 1996; Zhang 
et al., 1 992) or by universal adaptors that are ligated to DNA frag- 
ments after enzymatic digestion (Klein et al., 1999). MDA utilizes 
isothermal amplification by a DNA polymerase with strand 
displacement activity, typically (\> 29 , initiated by random priming 
of denatured DNA (Dean et al., 2002). The polymerase possesses 
high processivity and generates DNA amplicons up to 10 kb in 
length. Upon contact between the 3' end of an amplicon and 
the 5' end of an adjoining amplification product during synthesis, 
the latter gets displaced, liberating the strand for further amplifi- 



cation. All available methods introduce technical artifacts origi- 
nating from non-uniform genome coverage, in particular due to 
biased amplification of sequence rich in cytosine and guanosine 
(GC-bias), preferential allelic amplification or allelic dropout, base 
copy errors, and chimeric DNA molecules (Macaulay and Voet, 
201 4). Since the prevalence of a particular type of error depends 
on the method, the experimental technique should be selected 
based on the desired readout. In general, random primed PCR- 
based methods achieve a highly uniform amplification but yield 
only sparse coverage of the genome and are therefore well suited 
for low-resolution copy number variant detection down to a 
length scale of 60 kb (Mohlendick et al., 2013). Due to the high 
processivity in combination with the strand displacement activity, 
a much better genome coverage can be achieved with MDA. 
Together with the high fidelity of the (\>29 polymerase, this method 
is better suited for SNP calling. On the other hand, MDA yields 
highly non-uniform amplification, and the observed biases are 
only partially explained by the GC bias. This implies the risk of 
false positives if MDA is used for CNV detection. Moreover, 
both PCR- and MDA-based techniques produce chimeric DNA 
molecules, introducing artifacts that can be interpreted as indels 
or structural rearrangements (Voet et al., 2013). 

A technique for obtaining broad coverage of the genome 
together with uniform amplification was recently developed 
that combines pre-amplification by a polymerase with strand- 
displacement activity and amplification by PCR (Zong et al., 
2012). The method, termed multiple annealing and looping- 
based amplification cycles (MALBAC), pre-amplifies DNA with 
a strand-displacement polymerase and generates amplicons 
with complementary ends. This complementarity induces loop 
formation and prevents the amplicon from being used as a tem- 
plate during subsequent cycles to attain close-to-linear amplifi- 
cation. After five cycles of pre-amplification, the material is 
amplified exponentially by PCR. Sequencing of MALBAC-ampli- 
fied material from a single cell yielded 93% genome coverage at 
an average 25 x sequencing depth. Due to the improved unifor- 
mity and a substantially lower allele dropout rate in comparison 
to MDA (~1 % for MALBAC versus ~31 %-65% for MDA [Leung 
et al., 201 5; Zong et al., 201 2]), MALBAC shows higher detection 
efficiency for SNPs and CNVs. The residual false-positive rate of 
MALBAC (~4 X 10“^) is due to the relatively low fidelity of the 
polymerase and could be reduced by sequencing two or three 
daughter cells derived from the same mother cell. MALBAC is 
therefore well suited for the simultaneous characterization of 
SNPs and CNVs. 

Another strategy to eliminate amplification biases and alleviate 
non-uniformity of genome coverage inherent to MDA is the 
reduction of the reaction volume, for instance, by using nano-liter 
reaction wells (Gole et al., 201 3). This method, termed micro-well 
displacement amplification system (MIDAS), reduces reaction 
volume by ~1 ,000-fold in comparison to conventional MDA, 
thereby increasing the effective template concentration and 
reducing contamination. Traditional whole-genome amplifica- 
tion requires extensive purification in order to reduce environ- 
mental contamination. In another study, a nano-liter reaction vol- 
ume was obtained by applying microfluidics to WGA, thereby 
minimizing amplification error and yielding an extremely low error 
rate of 4 x 1 (Wang et al., 201 2). A more detailed comparison 
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of the available WGA methods has been presented elsewhere 
(Macaulay and Voet, 2014). 

Following WGA, quantification can be performed either by 
DNA microarrays or by next-generation sequencing. Microarrays 
can resolve larger CNVs, down to less than 100 kb (Mohlendick 
et al., 2013), and SNP arrays have been used to infer genome- 
wide haplotypes from a single human cell with high accuracy 
(Fan et al., 2011). Moreover, family-based phasing approaches 
were successfully applied for haplotyping human embryos (Otto- 
lini et al., 2015). Next-generation sequencing offers the advan- 
tage that every amplified base of the DNA is quantified with 
digital precision and thus enables detection of all types of anom- 
alies, while the microarray readout is constrained by the probe 
library. Moreover, paired-end sequencing provides additional in- 
formation since the mapped loci of the two ends together with 
the fragment size distribution can reveal structural rearrange- 
ments within the genome. Of note, sequencing the genome of 
a single cell with the Strand-seq protocol retains the strand infor- 
mation and allows the derivation of sister chromatid exchange 
(Falconer et al., 2012). This method provides valuable informa- 
tion for de novo genome assembly or the revision of existing 
assemblies. 

Although substantial progress has been made toward attain- 
ing high coverage and uniformity of WGA, there is room for 
improvement of existing methods, as recently demonstrated 
by the development of MALBAC (Zong et al., 201 2) or by scaling 
down the reaction volume in order to reduce amplification (Gole 
et al., 2013; Wang et al.,2012). 

Analysis of Single-Cell Genome Sequencing Data 

The first step in the data analysis after obtaining a file with 
sequencing reads is mapping to a reference genome. The 
genomic DNA sequence for most model organisms can be 
readily obtained from various online databases, such as the 
UCSC genome browser (Meyer et al., 2013) or www.ensembl. 
org (Cunningham et al., 2015). Prior to mapping, it is advisable 
to inspect the read quality and trim low-quality bases as well 
as remaining adaptor sequences at the end of the reads. Flow- 
ever, if the remaining read length is too short, reads should be 
discarded in order to avoid erroneous mappings. Furthermore, 
it is recommended to remove PCR duplicates. After the mapping 
is performed, reads that map to more than a single locus should 
be discarded or counted with reduced uniform weight for each 
locus, such that the weights of each read add up to one. Subse- 
quent processing depends on the type of analysis. To determine 
CNVs, local variability in read coverage can be alleviated by seg- 
menting the genome into bins. After correcting the number of 
reads within each bin for GC bias CNV breakpoints can be deter- 
mined based on a comparison of the change in read number be- 
tween adjacent bins to a background model (Venkatraman and 
Olshen, 2007; Zhang et al., 2013). For instance, the circular bi- 
nary segmentation algorithm (Venkatraman and Olshen, 2007) 
uses t-statistics with a permutation reference distribution to infer 
p values for breakpoints. Another study employed a hidden Mar- 
kov model for CNV detection, with the hidden states corre- 
sponding to the local copy number (Zong et al., 201 2). Abnormal 
copy numbers in a cancer cell were inferred after eliminating the 
amplification bias with a normalization factor derived from a non- 



cancer cell. The emission probabilities of this model correspond 
to binary vectors indicating whether the cancer cell had higher 
copy number than the normal cell. The numerous published 
methods for CNV detection using next-generation sequencing 
were discussed in a recent review (Zhao et al., 2013). 

The genome analysis toolkit GATK comprises a bundle of 
methods for processing of next-generation sequencing data 
and variant calling (McKenna et al., 2010). In particular, it con- 
tains a Bayesian framework that can be used for SNP detection. 
For each locus, the genotype with the highest posterior probabil- 
ity is emitted if its log odds ratio exceeds a defined threshold. A 
comprehensive overview and a comparative analysis of existing 
software tools for SNP calling from next-generation sequencing 
data can be found in the literature (Nielsen et al., 2011). An 
advanced method for the detection of structural rearrangements 
utilizes paired-end read information by creating a bona fide list 
of discordantly mapped read pairs and identifies candidate rear- 
rangements supported by more than one pair from this list (Voet 
et al., 2013). 

Although correction of GC bias is possible (Baslan et al., 201 2; 
Voet et al., 2013; Zhang et al., 2013), other confounding factors 
such as allelic dropout or preferential allelic amplification cannot 
be easily corrected for and may introduce false positives in SNP 
and CNV detection. Random sequencing errors represent 
another source of uncertainty for SNP detection. To increase 
confidence, repeated detection of a given anomaly in more 
than a single daughter of the same cell is required (Zong et al., 

2012) . Finally, another confounding factor can be the cell-cycle 
phase since replication domains of cells in S phase can be 
mistaken as genuine structural aberrations (Van der Aa et al., 

2013) . This problem can be avoided by using only nuclei in G1 
or G2/M phase. Limiting the analysis to G2/M phase comes 
with the additional advantage of having duplicated material after 
replication of the entire genome (Wang et al., 2014). 

Comparison of Single-Cell Transcriptome Sequencing 
Techniques 

Measuring gene expression in populations of cells with microar- 
rays or RNA sequencing masks the true distribution of gene 
expression levels across cells, and it is therefore crucial to 
quantify gene expression in individual cells. The major hurdle is 
to obtain sufficient material from an individual cell that can be 
sequenced with standard next-generation sequencing proto- 
cols. Different methods for the amplification of the sub-picogram 
amount of mRNA from a single cell have been developed and are 
discussed in detail below. The main problem with any of these 
methods is the presence of amplification bias, which can distort 
the relative abundances of mRNAs from different genes. 

In the past, amplified RNA from single cells was quantified with 
microarrays (Iscove et al., 2002). More recently, a number of sin- 
gle-cell sequencing techniques with improved sensitivity were 
developed. The first protocol for single-cell sequencing was pub- 
lished in 2009 by the Surani laboratory (Tang et al., 2009) and was 
subsequently used to trace the derivation of mouse embryonic 
stem cells from the inner cell mass with single-cell resolution 
(Tang et al., 2010). The amplification method is based on pull- 
down and reverse transcription of polyadenylated RNA using a 
poly(T) primer with a specific anchor sequence. Thereafter, the 
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single-stranded cDNA is polyadenylated and second-strand syn- 
thesis is performed using a poly(T) primer with another anchor 
sequence. The double-stranded cDNA is then PCR amplified 
from primers against the two anchor sequences, and the resulting 
material is fragmented prior to library preparation. Although 
SOLID sequencing was applied initially, the protocol is compat- 
ible with lllumina sequencing, which has become the prevalent 
method for single-cell sequencing. An initial method that lever- 
aged the integration of DNA barcodes to allow pooling of the ma- 
terial extracted from different cells along with preservation of 
strand information was termed single-cell tagged reverse tran- 
scription (STRT) (Islam et al., 2011). This technique exploits the 
template-switching property of the reverse transcriptase to tag 
the 5' end of polyadenylated mRNA molecules. Following PCR 
amplification, the tagged ends are pulled down and sequenced, 
yielding a strong 5' end bias of the sequencing read. A comple- 
mentary method termed cell expression by linear amplification 
and sequencing (CEL-seq) amplifies polyadenylated mRNA line- 
arly from a T7 promoter introduced during cDNA synthesis, 
thereby reducing amplification bias and alleviating the need for 
a template switch. Here, only fragments derived from the 3' end 
of the mRNA are sequenced. CEL-seq and STRT-seq integrate 
a barcode into the sequencing primer, a stretch of eight nucleo- 
tides that uniquely labels all mRNAs from the same cell. In order 
to robustly assign mRNAs to different cells, each pair of barcodes 
should differ in at least two positions. To obtain read coverage 
along the entire transcript, the Smart-seq and Smart-seq2 
methods are a more recent alternative (Picelli et al., 2013; Ram- 
skold et al., 2012). Similar to STRT, this approach reverse tran- 
scribes polyadenylated RNA and exploits the template-switching 
capacity of the reverse transcriptase. However, using the Nextera 
technology, the Tn5 transposase simultaneously fragments the 
cDNA and ligates sequencing adaptors to all fragments, yielding 
sequencing reads derived from the entire transcript. Another 
more recent method that yields read coverage of the entire 
gene body is the Quartz-seq method, which is similar to the 
approach developed by the Surani laboratory (Tang et al., 2009) 
but achieves higher sensitivity and reproducibility (Sasagawa 
et al., 2013). Moreover, two whole-transcript sequencing 
methods for low starting material have been published, exploiting 
either $29 DNA polymerase or semi-random-primed PCR based 
amplification (Pan et al., 2013). 

To reduce amplification bias, unique molecular identifiers 
(UMI) (Kivioja et al., 2012) have been integrated into some of 
the single-cell sequencing protocols. UMIs are stretches of 
four to ten random nucleotides integrated into a sequencing 
primer and serve as a random barcode for each mRNA molecule. 
Upon binding of the sequencing primer, each mRNA is uniquely 
labeled with a random barcode and the labeled end of the mRNA 
is amplified along with the barcode. After sequencing, the ampli- 
fication bias can be eliminated by counting each label only once 
instead of the reads derived from all amplicons. The number of 
UMIs can thus be directly translated into the number of 
sequenced molecules from a cell after application of a mathe- 
matical correction to account for the effect of random counting 
statistics (Grun et al., 2014; Kivioja et al., 2012). 

UMIs can only be used for methods that sequence a single 
tag derived from a given mRNA and have been integrated, for 



example, into the STRT protocol (Islam et al., 2014) and into 
modified versions of CEL-seq (Grun et al., 2014; Jaitin et al., 
2014). It has been shown that counting UMIs instead of reads 
leads to a 2-fold reduction of technical noise (Grun et al., 2014). 

An overview of three common single-cell sequencing methods 
is given in Figure 1 . In order to select the appropriate sequencing 
technology, one has to consider the goal of the experimental 
study. For example, in order to investigate gene expression het- 
erogeneity between cells, the technical variability should be 
minimized and a technology that allows integration of UMIs 
should be chosen. However, if information along the entire tran- 
script is required, for instance, to examine splicing patterns, a 
technology that yields whole-transcript coverage should be 
selected. Moreover, methods that sequence either the 5' or 3' 
end of a transcript provide single-cell information on the tran- 
scriptional start site or polyadenylation site usage, respectively. 
Another aspect to consider is ease of the experimental proce- 
dure and sequencing cost per cell. An increasing number of pro- 
tocols can be conveniently performed on the Fluidigm Cl multi- 
fluidic auto-prep system. This device permits the isolation and 
processing of the cells, with the important benefit that each 
cell is imaged. This allows controlling for multiple cells per well 
and empty wells. However, sequencing-chips that can be used 
in this device come in fixed geometries and preferentially select 
cells of particular sizes. Moreover, this technology is relatively 
expensive. A massively parallel RNA single-cell sequencing 
framework termed MARS-seq (Jaitin et al., 2014) has been 
developed based on the CEL-seq technology and employs auto- 
mated processing of single cells sorted into 384-well plates. 

Recently, two advanced droplet-based microfluidic methods, 
termed Drop-seq (Macosko et al., 2015) and inDrop sequencing 
(Klein et al., 201 5), were published that can dramatically increase 
the throughput to thousands of cells and at the same time mini- 
mize the sequencing costs. Both of these methods rely on the 
separation of cells into nanoliter-sized aqueous droplets in an 
oil-water emulsion, which contains sequencing primers with 
unique cell barcodes and UMIs. The co-occurrence of multiple 
cells in the same droplet is avoided by a low cell-loading rate 
into the droplets. In Drop-seq cDNA is PCR amplified, while in- 
Drop sequencing amplifies cDNA by in vitro transcription akin 
to CEL-seq. In terms of technical noise and sensitivity, these 
methods compare favorably to previous protocols. Drop-seq 
was used to characterize mouse retinal cells, while inDrop 
sequencing was applied to explore cellular heterogeneity during 
mouse embryonic stem cell differentiation. However, the set-up 
for neither of these methods is commercially available, and the 
user is required to build a microfluidic device based on the infor- 
mation provided by the authors. 

Although there has been much progress in increasing 
throughput and lowering costs of single-cell sequencing, there 
has been only a moderate improvement of the sequencing sensi- 
tivity during the last 3 years. The most common method to quan- 
tify sensitivity is the usage of external spike-in RNA of known 
concentration. The spike-in concentration should be chosen 
such that spike-in RNA contributes 1 %-5% of the number of 
mRNA molecules (Hashimshony et al., 2012). Most of the 
recently published sensitivity estimates are derived from a set 
of 92 spike-in RNAs designed by the External RNA Controls 
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Figure 1 . Three Common Experimental Pro- 
tocols for Single-Cell Sequencing 

(A) CEL-seq. Polyadenylated mRNA is reverse 
transcribed from an Oiigo dT primer containing 
the iiiumina P1 adaptor, a ceii barcode, and a T7 
promoter. The sequencing primer can, in principie, 
aiso accommodate a UMi. Foiiowing second- 
strand synthesis, the cDNA is ampiified by in vitro 
transcription from the T7 promoter, and the iiiu- 
mina P2 adaptor is ligated after fragmentation. The 
sequencing reads are thus derived from the mRNA 
3' end. 

(B) STRT-seq. Polyadenylated RNA is reverse 
transcribed from an Oligo-dT pimer containing the 
lllumina P1 adaptor and a Pvul restriction site. 
After full-length reverse transcription, a template- 
switching oiigo with another lllumina P1 adaptor 
and the UMI is added to the 5' end of the tran- 
script. Following second-strand synthesis, the 
cDNA is then PCR amplified using primers com- 
plementary to the lllumina P1 adaptor. Fragmen- 
tation and ligation of the lllumina P2 adaptor and 
the cell barcode are performed simultaneously 
utilizing the Tn5 transposase. To retain only 5' 
ends for sequencing, the 3' ends are digested by 
the Pvul restriction enzyme. 

(C) Smart-seq2. Polyadenylated RNA is reverse 
transcribed from an Oiigo dT with a PCR primer. 
The same PCR primer is part of the template- 
switching oiigo added to the 5' end of the cDNA 
upon reverse transcription. After PCR amplifica- 
tion, the cDNA is fragmented by tagmentation 
using the Tn5 transposase. Simultaneously, Tn5 
ligates different 5' and 3' primers to the frag- 
ments. Another round of PCR introduces Nextera- 
sequencing primers to the ends of the frag- 
ments, enabling sequencing with full-length read 
coverage. However, Smart-seq2 does not allow 
for the integration of UMIs. 



Consortium (ERCC) (Baker et al., 2005) and cover a wide range, 
from 5% to 40%. However, independent methods such as imag- 
ing-based molecule counting in single cells by single-molecule 
fluorescent in situ hybridization (smFISH) (Raj et al., 2008) yield 
deviating estimates (Grun et al., 2014). Moreover, the absolute 
number of transcripts per cell is comparable when sequencing 
cells of the same type with different methods. The ERCC 
spike-in RNAs are relatively short in comparison to mammalian 
genes, have short poly(A) tails, and lack a 5' cap. It is unclear 
how much these differences between external and cellular 
RNA— as well as the fact that the external RNA is not spiked 
directly into the cell— affect the relative sequencing efficiencies 
of cellular and spike-in RNA. 

Data Analysis of Single-Cell Transcriptome Data 
Preprocessing and Read Mapping 

In order to retrieve the maximum information from single-cell 
mRNA sequencing data, a careful experimental design is 
required (see Box 1). Following sequencing, a number of data 
processing and filtering steps are recommended to reduce the 
impact of technical noise. The first analysis step is usually a qual- 
ity filtering or trimming of the sequencing reads prior to mapping 



the reads to a reference database. Standard tools, e.g., fastqc, 
permit a quality analysis of the sequenced library, and standard 
mapping tools, such as bwa (Li and Durbin, 201 0), allow trimming 
of low-quality bases from the end of the reads. However, a min- 
imum remaining read length (> 35 bp for mouse or human) after 
trimming should be required in order to avoid false-positive hits. 
For the mapping, available standard tools developed for bulk 
RNA-seq analysis can be used (Garber et al., 2011). However, 
sequenced cell barcodes, UMIs, and other primer-derived 
sequences have to be removed from the remaining read to be 
mapped to the reference database. Usually, one read of a pair 
contains all of the index information, while the other one can 
be mapped to the gene models (see Figure 1). In general, reads 
can be mapped to the genome followed by expression quantifi- 
cation via intersecting the read coverage of the genome with 
gene model annotations. However, this can lead to a larger num- 
ber of reads mapping to multiple loci, for instance, due to the 
existence of inactive pseudogenes. Using the transcriptome as 
a reference reduces the sequence space and increases the 
fraction of unique reads. Since non-unique reads can introduce 
spurious correlations between different genes across single 
cells, it is advisable to discard these reads prior to analysis. 
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Box 1 . Design of Single-Cell Sequencing Experiments 



The power of single-cell sequencing crucially depends on two parameters: the number of cells and the sequencing complexity. These parameters 
can be controlled by the experimental design and should be chosen according to the goal of the study. The size of the dataset, i.e., the number of 
cells is important for profiling the cell composition of a sample with high sensitivity. Typically, several hundreds of cells have to be sequenced in order 
to capture not only abundant, but also rare, cell types. Possible biases that might occur during purification of the single-cell sample due to cell size or 
other factors have to be considered. Moreover, one should incorporate an estimate for the success rate, since a number of single-cell samples will 
likely yield only little or no material due to RNA degradation or low amplification efficiency. This estimate can be derived from trial experiments. The 
second parameter is the library complexity. Since the efficiency of single-cell mRNA sequencing is still limited, it is important to sequence each sin- 
gle cell with sufficient sequencing depth. If transcripts are counted with UMIs, the sequencing depth should be adjusted such that every transcript is 
sequenced at least three to four times. This ensures that even lowly expressed genes can be quantified and do not drop out due to sampling noise. 
To determine how many cells can be sequenced at once, e.g., on a single lane of an lllumina sequencing machine, the fraction of reads that can be 
mapped to the transcriptome has to be taken into account. This fraction is typically lower than 50%, since in most protocols additional abundant 
contributions can originate from sequencing products containing only primer or adaptor sequences (Grun et al., 2014). For example, assuming that 
~1 0,000 transcripts per cell have been amplified and 50% of the reads can be mapped to the transcriptome, about 2,500 cells can be sequenced on 
a single lane of an lllumina NextSeq machine with 200 million reads. A fraction of those, typically around 10% to 20%, will not pass the quality 
filtering. Microfluidic devices like the Fluidigm Cl further provide an image of each cell being processed and allow filtering of wells containing no 
or more than a single cell. 

To avoid batch effects, one should follow general guidelines applicable for bulk sequencing. For instance, single-cell libraries corresponding to 
different conditions should not be sequenced on separate lanes but, rather, distributed in equal fractions across the same set of lanes. 



Due to the low read coverage of the gene body in single-cell 
sequencing experiments, isoform quantification with standard 
methods such as Cufflinks (Trapnell et al., 201 0) can be problem- 
atic. If isoform information is not essential for the study, an ideal 
strategy is to merge all isoforms of a given gene into a so-called 
gene locus and quantify the expression of these gene loci. Inde- 
pendent of the reference, it is important to consider specific as- 
pects of the experimental strategy. If sequencing protocols are 
used that enrich for the 5' or 3' end of an mRNA, the quality of 
the gene annotation can have a huge impact on the sensitivity. 
Gene models tend to be less reliable at both ends of a transcript, 
and an experimental strategy for improving 5' or 3' end annota- 
tion might be beneficial, in particular, for non-standard model 
organisms. For example. Junker et al. applied an amended 
CEL-seq protocol to sequence longer reads at low depths on 
bulk material in order to accurately detect 3' polyadenylation 
sites for zebrafish embryos (Junker et al., 201 4). Finally, the refer- 
ence database has to be augmented by sequences representing 
any spike-in RNA added to the samples. 

Expression Quantification and Entering 
In order to arrive at expression levels for all genes, PCR dupli- 
cates should be removed. Next, the cell of origin is determined 
based on the sequenced cell barcode (Figure 2A). If the base- 
calling quality is not sufficiently high at the cell barcode position 
within the read, an error-tolerant assignment scheme can be 
applied by aggregating all barcodes up to a single mismatch 
away from the perfect sequence. In order to apply this scheme, 
however, each pair of cell barcodes has to differ in at least two 
positions. If UMIs are available, the number of different UMIs 
per gene in each cell has to be converted into a transcript count 
estimate (Figure 2A) by applying a statistical correction to ac- 
count for sampling effects (Grun et al., 201 4; Kivioja et al., 201 2). 

Once read or transcript counts have been determined for all 
cells, it is recommended to filter out cells of low yield 
(Figure 2B). These samples can arise already prior to or during 
isolation of the cells, e.g., due to stress or apoptosis, or can 
occur due to incomplete lysis, RNA degradation, or low 



sequencing efficiency of a particular cell. The total number of 
reads or UMI-derived transcript counts per cell is a first proxy 
for the sample quality. Applying a threshold to discard cells in 
the lower tail of the distribution of read or transcript counts, 
respectively, safeguards against artifacts arising from low-qual- 
ity cells. The expression of spike-in RNA can be utilized to iden- 
tify and discard samples of low sequencing efficiency. Since 
the number of spike-in RNA should be identical for all samples, 
the identification of low yield samples is straightforward 
(Figure 2B). On the other hand, a relatively large ratio between 
transcript or read counts, respectively, of spike-in and cellular 
RNA reveals cells that contribute little material, e.g., due to 
RNA degradation or incomplete cell lysis (Figure 2B). The 
described strategies are only guidelines for filtering, and the 
exact method strongly depends on the dataset under examina- 
tion. For example, if the cell volume varies substantially within 
a dataset, the total transcript count should only be subject to 
mild filtering, while the transcript count of the spike-in RNA is still 
a good proxy for the sequencing efficiency and can be used to 
discard low yield samples. 

Data Normaiization 

For subsequent analysis, an appropriate normalization of the 
expression data is necessary. In the case of read-based quanti- 
fication, normalization to transcripts per one million reads 
(TPM)— if reads are only generated from one end of the tran- 
script— or transcript per one million reads per kilobase of tran- 
script (RPKM)— if reads cover the entire transcript— is appro- 
priate. Alternatively, standard quantification methods like 
Cufflinks (Trapnell et al., 2010) yield normalized expression 
values. More refined normalization schemes have been devel- 
oped for bulk RNA-seq data (Anders and Huber, 2010). Here, 
derivation of a size factor for each replicate accounts for vari- 
ability in sequencing depth between replicates, and a similar 
method can be applied to normalize single-cell data (Brennecke 
et al., 2013). If transcripts are counted with UMIs, cell-to-cell dif- 
ferences in transcript numbers are to a certain extent biologically 
meaningful and indicative of variations in the RNA content of a 
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Figure 2. Quantification of mRNA Expres- 
sion with UMIs 

(A) For single-cell sequencing, RNA is isolated 
from individual cells and, after labeling with cellular 
barcodes, amplified by PCR or in vitro transcrip- 
tion. Sequencing reads are subject to quality 
filtering and trimming before mapping to reference 
sequences representing all genes of the organism. 
In (A), only two cells with two different genes are 
shown for simplicity. Amplification bias can distort 
the relative expression of the two genes and can 
be eliminated by counting the number of UMIs per 
genes instead of sequencing reads. 

(B) Cells with low yield due to RNA degradation or 
low sequencing efficiency should be discarded. 
These cells can be identified based on low total 
transcript counts (left), which can be explained by 
low-amplification efficiency (red bar, middle) or 
low-input material (orange bar, right). The middle 
panel depicts the total number of spike-in RNA, 
which should theoretically be the same in all cells. 
Variations are due to variability in sequencing ef- 
ficiency. The right panel shows the ratio between 
spike-in RNA and transcripts of cellular genes. 
High ratios correspond to reduced amounts of 
cellular RNA. 

(C) Data normalization by down-sampling. The 
same number of transcripts is randomly picked 
from each cell. Shown is a toy example with three 
cells and four different genes. 



sampled to a number lower or equal 
than their actual transcript count. How- 
ever, for most applications, this approach 
is preferable since technical artifacts such 
as batch effects are efficiently eliminated. 

Biological Insights from Single-Cell 
Transcriptome Data 
Identification of Ceii Types 

Perhaps the most important application 



cell. However, cell-to-cell variability in sequencing efficiency and 
other sources of technical noise contribute to the observed vari- 
ability. In principle, the technical cell-to-cell variability could be 
deconvolved with the help of spike-in RNA. The ratio of the num- 
ber of sequenced spike-in molecules over the number of spike-in 
molecules added to the cell extract yields a conversion factor. In 
theory, dividing the number of sequenced transcripts by this con- 
version factor yields an estimate of the actual number of tran- 
scripts in a cell. However, as already discussed, commonly 
used ERCC spike-in RNA does not provide a good standard for 
absolute quantification. For most applications, the relative contri- 
bution of each gene to the transcriptome will be the relevant 



of single-cell mRNA sequencing is the 
identification of cell types in a complex 
mixture. The transcriptome of a cell can be interpreted as a finger- 
print revealing its identity. An unbiased screening of randomly 
sampled cells from a mixture, such as an organ, could therefore 
reveal the cellular composition of this sample. A number of 
studies could recover known cell types and identify novel marker 
genes in diverse systems, for example in the spleen (Jaitin et al., 
201 4), the lung epithelium (Treutlein et al., 201 4), orthe retina (Ma- 
cosko et al., 2015). Another recent landmark paper revealed the 
complex cellular composition of the mouse hippocampus and 
uncovered novel cell types (Zeisel et al., 2015). Although these 
studies convincingly demonstrated that single-cell mRNA 
sequencing is a powerful method for cell type identification. 



readout, and in these cases, simpler normalization schemes 
apply. One possibility is the normalization of the total transcript 
count in each cell to the median of the total transcript count 
across all cells. Alternatively, subsampling of the same 
number of transcripts from each cell, termed down-sampling 
(Figure 2C), is more efficient in eliminating technical variability 
but comes with a loss in complexity since all cells are down- 



computational methods to leverage the full complexity within sin- 
gle-cell transcriptome data are just beginning to emerge. Distin- 
guishing cell types in a mixture corresponds to a typical unsuper- 
vised learning problem in which data points, in this case given by 
single-cell transcriptomes, are grouped into clusters reflecting 
subsets of data points that are more similar to each other than 
to the remaining data points (Figure 3A). A commonly applied 
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Figure 3. Single-Cell Sequencing Allows 
the Inference of Cell Type Composition 

(A) Unsupervised learning can be used to distin- 
guish different cell types in a random sample 
of sequenced cells from a complex mixture. 
K-means clustering with a cluster number esti- 
mated by gap statistics (top) or hierarchical clus- 
tering (bottom) based on transcriptome similarity 
can be used to identify different abundant cell 
types. All data shown in the figure are derived from 
238 random cells isolated from mouse intestinal 
organoids (Grun et al., 2015). 

(B) Dimensional reduction algorithms can be 
applied for data visualization. The t-SNE method 
(top) resolves the local structure of the data but 
tends to group outliers together by their dissimi- 
larity to bigger clusters. PGA (middle) allows visual 
inspection of data separation along the main axis 
of variability but can be inconvenient if a larger 
number of principal components contribute sub- 
stantial variability. Classical multidimensional 
scaling achieves dimensional reduction with well- 
conserved point-to-point distances. Outliers are 
well separated, but dense clusters tend to be 
condensed. K-means clusters (A) are highlighted 
in all of the maps, and intestinal cell types are 
indicated in the t-SNE map. 

(C) The RacelD algorithm identifies rare cell 
types within more abundant groups separated by 
k-means clustering. The algorithm can detect 
cell types represented by only a single cell in the 
mixture. 



within the retina (Macosko et al., 2015) 
or the hippocampus (Zeisel et al., 2015). 
These methods take arbitrary similarity 
measures as input. The most common 
choices are the Euclidean distance be- 
tween vectors with expression values for 
all genes or a correlation-based distance 
between these vectors, e.g., 1 - Pear- 
son’s correlation coefficient. 



visual method is principal component analysis (PCA), which con- 
verts a set of correlated variables into a set of orthogonal uncor- 
related variables, termed principal components. These principal 
components are ordered by the fraction of the total variance they 
explain, and usually only the first two or three principal compo- 
nents are analyzed. Visual inspection of a scatterplot showing 
the first two principal components can already reveal the main 
subgroups in the data, i.e., the abundant cell types (Pollen 
et al., 2014; Shalek et al., 2014; Treutlein et al., 2014). Moreover, 
a number of algorithms for dimensional reduction exist that can 
be used to obtain an approximate visualization of the data in 
two dimensions (Figure 3B). These algorithms take a matrix 
with all pairwise distances of data points as input and project 
these points onto a low-dimensional space, trying to preserve 
the original pairwise distances as much as possible. For example, 
classical multidimensional scaling was used to visualize intratu- 
moral heterogeneity in glioblastoma (Patel et al., 2014), and 
t-distributed stochastic neighbor embedding (t-SNE) (Van der 
Maaten and Hinton, 2008) beautifully visualized heterogeneity 



To identify cell types more systemati- 
cally, conventional clustering methods 
can be applied. For instance, hierarchical clustering was used, 
alone or in combination with PCA, to explore cellular heterogene- 
ity (Patel et al., 201 4; Pollen et al., 201 4; Treutlein et al., 201 4). On 
the other hand, more sophisticated algorithms have been specif- 
ically adjusted for cell type profiling. Jaitin et al. utilized hierarchi- 
cal clustering to initialize a probabilistic mixture model for cell 
type classification (Jaitin et al., 2014). Zeisel et al. developed a 
clustering method based on sorting points into neighborhoods 
(SPIN) (Tsafrir et al., 2005). In an iterative procedure, an optimal 
splitting of the cell-to-cell correlation matrix is determined after 
ordering the expression matrix by cells and genes using SPIN. 

A general problem for cell type classification is the presence of 
confounding factors due to technical and biological variability. 
The result of any clustering routine has to be carefully examined 
for batch effects leading to unwanted clustering by experi- 
mental batch, sequencing library, or other technical factors. 
Batch effects can be reduced by normalization strategies such 
as down-sampling that eliminate differences in complexities 
between libraries. 
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However, additional confounding factors can arise from bio- 
logical heterogeneity such as cell-to-cell differences in the cell- 
cycle phase. If only cells of a similar size are analyzed, cell sorting 
can be used to purify cells within a given cell-cycle phase. Other- 
wise, computational approaches can be used to deconvolve 
cell-cycle-related variability. A recently published approach uti- 
lizes latent variable models to account for the cell cycle and other 
hidden factors (Buettner et al., 201 5). On the other hand, normal- 
ization schemes that eliminate absolute cell-to-cell differences in 
transcript count are often sufficient. 

A major challenge for any cell type inference method is the 
identification of rare cell types. With a frequency of ~1% or 
less in a sample of sequenced cells from a complex mixture, 
these cell types typically occur as outliers. Although unsuper- 
vised learning methods for outlier identification exist, these ap- 
proaches oftentimes cannot capture the full complexity of the 
data. For instance, classifying a variety of different rare cell types 
in an organ cannot be achieved by these methods (Grun et al., 
2015). In a recent study, an algorithm for rare cell type identifica- 
tion (RacelD) was introduced (Grun et al., 2015) that first 
infers abundant cell types by k-means clustering followed by a 
systematic outlier screening (Figure 3C). In this step, the cell- 
to-cell variability of every gene is compared to a background 
model that accounts for technical and biological noise within a 
cluster. Cells exhibiting transcript counts with a low p value ac- 
cording to this background model are identified as outliers and 
are used as new cluster seeds. RacelD was shown to identify 
rare mouse intestinal cell types with high sensitivity and speci- 
ficity and discovered novel rare subtypes of the enteroendocrine 
lineage. 

Identification of Marker Genes 

Once cell types can be delineated, the data can be mined for 
specific marker genes to better characterize a cell type and, 
with the help of cell surface markers or fluorescent reporter 
genes, allow the purification of a cell type. The discovery of a 
marker gene requires the identification of differentially expressed 
genes between the cell type of interest and the remaining cells. 
For this task, available methods for modeling over-dispersed 
count statistics in bulk sequencing data, such as DESeq 
(Anders and Huber, 2010), can be applied. Another probabilistic 
method, which was developed specifically for single-cell 
sequencing data, accounts for the relatively high rate of dropout 
events in these data, i.e., transcripts that escaped reverse tran- 
scription and therefore could not be sequenced (Kharchenko 
et al., 2014). 

Inference of Differentiation Dynamics 

Related to the cell type inference is the application of single-cell 
transcriptomics to reveal differentiation pathways. A comparison 
of single-embryo transcriptomes collected at sub-sequent 
stages of nematode embryonic development has already re- 
vealed insights into gene expression changes underlying the 
emergence of the three germ layers (Hashimshony et al., 
2015). More generally, if a sample is analyzed that contains all 
differentiation stages of a given cell lineage, a pseudo-temporal 
ordering of single-cell transcriptomes can be inferred. For 
example, such a sample can be composed of cells collected at 
different time points during in vitro differentiation or can be a 
random sample of a mitotic adult stem cell differentiation system 



such as the intestinal epithelium. The general idea is that differ- 
entiation is accompanied by continuous temporal changes in 
gene expression and that ordering of single-cell transcriptomes 
by similarity reflects the succession of these changes, yielding a 
pseudo-temporal ordering of single-cell transcriptomes. One ex- 
isting method termed Monocle combines dimensional reduction 
with the construction of a minimum spanning tree (Trapnell et al., 
2014). Monocle is an unsupervised approach that can infer 
branching into multiple lineages and was used to elucidate 
gene expression dynamics during differentiation of primary hu- 
man fibroblasts. Another more recent method relies on the 
use of diffusion maps to define differentiation trajectories, incor- 
porating the idea that the movement of a cell within the tran- 
scriptional landscape follows diffusion-like dynamics (Haghverdi 
et al., 2015). 

Finally, by defining links between gene pairs, e.g., based on 
the significance of correlation, a variety of network analysis 
methods can be applied (Ocone et al., 2015). 

There is certainly room for further development of computa- 
tional methods to infer cell lineages. This inference is particularly 
challenging if the lineage tree segregates into multiple branches, 
since technical and biological gene expression noise can 
confound the assignment of a cell to a particular lineage. The 
single-cell perspective will yield exciting new insights into the 
impact of gene expression noise on lineage commitment and 
on the regulation of gene expression noise during differentiation. 
Measuring Gene Expression Noise 

Another application of single-cell mRNA sequencing is the inves- 
tigation of biological gene expression variability, or gene expres- 
sion noise, in a population of cells. Current models of transcrip- 
tional dynamics describe promoter bursting (Figure 4A), where 
the promoter of a gene switches between an active and an inac- 
tive state and, once activated, initiates transcript production at a 
constant rate (Raj et al., 2006; Raser and O’Shea, 2004). These 
dynamics imply a variance in transcript levels exceeding the 
lower limit of pure sampling, i.e., Poissonian noise. Single-cell 
mRNA sequencing is a suitable method to infer the biological 
noise and investigate transcriptional parameters on a genome- 
wide level in a cell population of interest. However, technical 
noise due to sampling of transcripts to be sequenced from 
each cell and due to global cell-to-cell variability in sequencing 
efficiency (Figure 4B) has a substantial contribution to the 
measured noise levels (Brennecke et al., 2013; Grun et al., 
2014). The technical noise component can be quantified, for 
instance, based on sample-to-sample variability in spike-in 
RNA levels. After fitting a technical noise model that incorporates 
sampling noise and global sample-to-sample variability in 
sequencing efficiency, the technical noise component can be 
deconvolved from the total noise in order to infer the biological 
noise component (Figure 4C) (Grun et al., 2014). This approach 
has been shown to yield precise noise estimates consistent 
with single-molecule FISH, a highly sensitive imaging-based 
method for transcript counting (Raj et al., 2008), and can be 
used, for instance, to measure changes in biological noise be- 
tween different conditions. Furthermore, given a model of tran- 
scriptional bursting, kinetics model parameters such as burst 
size and burst frequency can be derived from the biological noise 
estimates (Grun et al., 2014). 
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Figure 4. Single-Cell Sequencing Reveals 
Biological Gene Expression Noise 

(A) Transcription is not a time-continuous process. 
Switching of a gene promoter between an active 
and an inactive state leads to transcriptional 
bursting. The kinetic parameters can be estimated 
from burst size and burst frequency, which can be 
derived from biological noise estimates measured 
with single-cell sequencing. 

(B) The CV as a function of the mean expression 
for spike-in RNA or fixed aliquots of cellular RNA 
reveals sources of technical noise. While sampling 
noise dominates at low expression, global vari- 
ability of sequencing efficiency is the major 
contribution for highly expressed genes. 

(C) Technical noise can be modeled and decon- 
volved from the transcript count distribution 
measured in cells, yielding good estimates of the 
actual biological noise (Grun et al., 2014). 



Another method allows the identification of highly variable 
genes by assigning a p value to each gene reflecting to what 
extent the biological noise exceeds technical variability (Bren- 
necke et al., 2013). This method also relies on technical noise 
estimates derived from external spike-in RNA. 

Investigating Aiieiic Expression 

Single-cell sequencing offers the possibility to study allelic 
expression on a genome-wide level. If the two alleles of a gene 
differ by a sufficient number of single-nucleotide polymorphisms 
(SNPs), transcripts derived from the two alleles can be distin- 
guished by single-cell mRNA sequencing. However, this analysis 
is highly sensitive to technical noise, i.e., spurious differences in 
allele frequencies due to sampling effects and stringent controls 
are required to infer actual biological differences. By analyzing 
mouse embryos of mixed genetic background, this approach 
has revealed the presence of abundant random monoallelic 
expression during preimplantation development and has 
demonstrated de novo inactivation of the paternal X chromo- 
some (Deng et al., 2014). 

Concluding Remarks 

The power of single-cell sequencing as a method to characterize 
the state of a cell across multiple molecular layers has been 
demonstrated by a number of beautiful studies published during 
the last few years. Most of the previous research was focused on 
the investigation of single-cell genomes and transcriptomes. 
While experimental protocols have improved rapidly, sophisti- 
cated computational methods are just beginning to emerge, 
and in this Primer, we have summarized a number of state-of- 
the-art methods along with general guidelines covering all anal- 
ysis stages. We hope that this overview will enable a growing 



number of researchers to leverage the maximum out of their sin- 
gle-cell sequencing data. The field of single-cell sequencing will 
keep developing rapidly in the near future and will reveal exciting 
insights into the regulatory mechanisms that determine the iden- 
tity of a cell. 

REFERENCES 

Alexandrov, L.B., and Stratton, M.R. (2014). Mutational signatures: the pat- 
terns of somatic mutations hidden in cancer genomes. Curr. Opin. Genet. 
Dev. 24, 52-60. 

Anders, S., and Huber, W. (201 0). Differentiai expression anaiysis for sequence 
count data. Genome Biol. 11, R106. 

Baker, S.C., Bauer, S.R., Beyer, R.P., Brenton, J.D., Bromley, B., Burrill, J., 
Causton, H., Conley, M.P., Elespuru, R., Fero, M., etal.; External RNA Controls 
Consortium (2005). The External RNA Controls Consortium: a progress report. 
Nat. Methods 2, 731-734. 

Basian, T., Kendaii, J., Rodgers, L., Cox, H., Riggs, M., Stepansky, A., Troge, 
J., Ravi, K., Esposito, D., Lakshmi, B., et ai. (2012). Genome-wide copy num- 
ber anaiysis of singie ceiis. Nat. Protoc. 7, 1024-1041. 

Biesecker, L.G., and Spinner, N.B. (2013). A genomic view of mosaicism and 
human disease. Nat. Rev. Genet. 14, 307-320. 

Brennecke, P., Anders, S., Kim, J.K., Kolodziejczyk, A.A., Zhang, X., Proser- 
pio, V., Baying, B., Benes, V., Teichmann, S.A., Marioni, J.C., and Heisier, 
M.G. (2013). Accounting for technical noise in singie-ceii RNA-seq experi- 
ments. Nat. Methods 10, 1093-1095. 

Buettner, F., Natarajan, K.N., Casaie, F.P., Proserpio, V., Sciaidone, A., 
Theis, F.J., Teichmann, S.A., Marioni, J.C., and Stegle, O. (2015). 
Computationai analysis of ceii-to-ceil heterogeneity in single-ceii RNA- 
sequencing data reveals hidden subpopuiations of ceiis. Nat. Biotechnoi. 
33, 155-160. 

Chattopadhyay, P.K., and Roederer, M. (2012). Cytometry: today’s technoiogy 
and tomorrow’s horizons. Methods 57, 251-258. 



808 Cell 163 , November 5, 2015 ©2015 Elsevier Inc. 





Cell 



Cheung, V.G., and Nelson, S.F. (1996). Whole genome amplification using a 
degenerate oligonucleotide primer allows hundreds of genotypes to be per- 
formed on less than one nanogram of genomic DNA. Proc. Natl. Acad. Sol. 
USA 93, 14676-14679. 

Cunningham, F., Amode, M.R., Barrell, D., Beal, K., Billis, K., Brent, S., 
Carvalho-Silva, D., Clapham, P., Coates, G., Fitzgerald, S., et al. (2015). 
EnsembI 2015. Nucleic Acids Res. 43 , D662-D669. 

Dean, F.B., Hosono, S., Fang, L., Wu, X., Faruqi, A.F., Bray-Ward, P., Sun, Z., 
Zong, Q., Du, Y., Du, J., et al. (2002). Comprehensive human genome amplifi- 
cation using multiple displacement amplification. Proc. Natl. Acad. Sci. USA 
99, 5261-5266. 

Deng, Q., Ramskold, D., Reinius, B., and Sandberg, R. (2014). Single-cell RNA- 
seq reveals dynamic, random monoallelic gene expression in mammalian 
cells. Science 343 , 193-196. 

Eberwine, J., Yeh, H., Miyashiro, K., Cao, Y., Nair, S., Finnell, R., Zettel, M., and 
Coleman, P. (1992). Analysis of gene expression in single live neurons. Proc. 
Natl. Acad. Sci. USA 89, 3010-3014. 

Eldar, A., and Elowitz, M.B. (2010). Functional roles for noise in genetic circuits. 
Nature 467, 167-173. 

Falconer, E., Hills, M., Naumann, U., Poon, S.S.S., Chavez, E.A., Sanders, 
A.D., Zhao, Y., Hirst, M., and Lansdorp, P.M. (2012). DNA template strand 
sequencing of single-cells maps genomic rearrangements at high resolution. 
Nat. Methods 9, 1107-1112. 

Fan, H.C., Wang, J., Potanina, A., and Quake, S.R. (2011). Whole-genome 
molecular haplotyping of single cells. Nat. Biotechnol. 29, 51-57. 

Garber, M., Grabherr, M.G., Guttman, M., and Trapnell, C. (2011). Computa- 
tional methods for transcriptome annotation and quantification using RNA- 
seq. Nat. Methods 8 , 469-477. 

Gole, J., Gore, A., Richards, A., Chiu, Y.-J., Fung, H.-L., Bushman, D., Chiang, 
H.-l., Chun, J., Lo, Y.-H., and Zhang, K. (2013). Massively parallel polymerase 
cloning and genome sequencing of single cells using nanoliter microwells. Nat. 
Biotechnol. 31 , 1126-1132. 

Grun, D., Kester, L., and van Oudenaarden, A. (2014). Validation of noise 
models for single-cell transcriptomics. Nat. Methods 11 , 637-640. 

Grun, D., Lyubimova, A., Kester, L., Wiebrands, K., Basak, O., Sasaki, N., 
Clevers, H., and van Oudenaarden, A. (2015). Single-cell messenger RNA 
sequencing reveals rare intestinal cell types. Nature 525 , 251-255. http://dx. 
doi.org/1 0. 1 038/naturel 4966. 

Haghverdi, L., Buettner, F., and Theis, F.J. (2015). Diffusion maps for high- 
dimensional single-cell analysis of differentiation data. Bioinformatics 31 , 
2989-2998. http://dx.d 0 i. 0 rg/l 0. 1 093/bioinformatics/btv325. 

Hashimshony, T., Wagner, F., Sher, N., and Yanai, I. (2012). CEL-Seq: single- 
cell RNA-Seq by multiplexed linear amplification. Cell Rep. 2, 666-673. 
Hashimshony, T., Feder, M., Levin, M., Hall, B.K., and Yanai, I. (2015). Spatio- 
temporal transcriptomics reveals the evolutionary history of the endoderm 
germ layer. Nature 519 , 219-222. 

Iscove, N.N., Barbara, M., Gu, M., Gibson, M., Modi, C., and Winegarden, N. 
(2002). Representation is faithfully preserved in global cDNA amplified 
exponentially from sub-picogram quantities of mRNA. Nat. Biotechnol. 20 , 
940-943. 

Islam, S., Kjallquist, U., Moliner, A., Zajac, P., Fan, J.-B., Lonnerberg, P., and 
Linnarsson, S. (2011). Characterization of the single-cell transcriptional 
landscape by highly multiplex RNA-seq. Genome Res. 21 , 1160-1167. 

Islam, S., Zeisel, A., Joost, S., La Manno, G., Zajac, P., Kasper, M., Lonner- 
berg, P., and Linnarsson, S. (2014). Quantitative single-cell RNA-seq with 
unique molecular identifiers. Nat. Methods 11 , 163-166. 

Jaitin, D.A., Kenigsberg, E., Keren-Shaul, H., Elefant, N., Paul, F., Zaretsky, I., 
Mildner, A., Cohen, N., Jung, S., Tanay, A., and Amit, I. (2014). Massively 
parallel single-cell RNA-seq for marker-free decomposition of tissues into 
cell types. Science 343 , 776-779. 

Junker, J.P., Noel, E.S., Guryev, V., Peterson, K.A., Shah, G., Huisken, J., 
McMahon, A.P., Berezikov, E., Bakkers, J., and van Qudenaarden, A. (2014). 
Genome-wide RNA Tomography in the zebrafish embryo. Cell 159 , 662-675. 



Kharchenko, P.V., Silberstein, L., and Scadden, D.T. (2014). Bayesian 
approach to single-cell differential expression analysis. Nat. Methods 11 , 
740-742. 

Kivioja, T., Vaharautio, A., Karlsson, K., Bonke, M., Enge, M., Linnarsson, S., 
and Taipale, J. (2012). Counting absolute numbers of molecules using unique 
molecular identifiers. Nat. Methods 9, 72-74. 

Klein, C.A., Schmidt-Kittler, Q., Schardt, J.A., Pantel, K., Speicher, M.R., and 
Riethmuller, G. (1999). Comparative genomic hybridization, loss of heterozy- 
gosity, and DNA sequence analysis of single cells. Proc. Natl. Acad. Sci. 
USA 96 , 4494-4499. 

Klein, A.M., Mazutis, L., Akartuna, I., Tallapragada, N., Veres, A., Li, V., Pesh- 
kin, L., Weitz, D.A., and Kirschner, M.W. (2015). Droplet barcoding for single- 
cell transcriptomics applied to embryonic stem cells. Cell 161 , 1187-1201. 
Leung, M.L, Wang, Y., Waters, J., and Navin, N.E. (2015). SNES: single 
nucleus exome sequencing. Genome Biol. 16 , 55. 

Li, H., and Durbin, R. (2010). Fast and accurate long-read alignment with 
Burrows-Wheeler transform. Bioinformatics 26 , 589-595. 

Macaulay, I.C., and Voet, T. (2014). Single cell genomics: advances and future 
perspectives. PLoS Genet. 10 , el 0041 26. 

Macosko, E.Z., Basu, A., Satija, R., Nemesh, J., Shekhar, K., Goldman, M., 
Tirosh, I., Bialas, A.R., Kamitaki, N., Martersteck, E.M., et al. (2015). Highly 
Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter 
Droplets. Cell 161 , 1202-1214. 

McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, 
A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., and DePristo, M.A. (2010). 
The Genome Analysis Toolkit: a MapReduce framework for analyzing next- 
generation DNA sequencing data. Genome Res. 20 , 1297-1303. 

Meyer, L.R., Zweig, A.S., Hinrichs, A.S., Karolchik, D., Kuhn, R.M., Wong, M., 
Sloan, C.A., Rosenbloom, K.R., Roe, G., Rhead, B., et al. (2013). The UCSC 
Genome Browser database: extensions and updates 2013. Nucleic Acids 
Res. 47, D64-D69. 

Mohlendick, B., Bartenhagen, C., Behrens, B., Honisch, E., Raba, K., Knoefel, 
W.T., and Stoecklein, N.H. (2013). A robust method to analyze copy number 
alterations of less than 100 kb in single cells using oligonucleotide array 
CGH. PLoS QNE8, e67031. 

Munsky, B., Neuert, G., and van Qudenaarden, A. (2012). Using gene expres- 
sion noise to understand gene regulation. Science 336 , 183-187. 

Nagaoka, S.I., Hassold, T.J., and Hunt, P.A. (2012). Human aneuploidy: 
mechanisms and new insights into an age-old problem. Nat. Rev. Genet. 13 , 
493-504. 

Navin, N., Kendall, J., Troge, J., Andrews, P., Rodgers, L., Mclndoo, J., Cook, 
K., Stepansky, A., Levy, D., Esposito, D., et al. (2011). Tumour evolution 
inferred by single-cell sequencing. Nature 472, 90-94. 

Nielsen, R., Paul, J.S., Albrechtsen, A., and Song, Y.S. (2011). Genotype 
and SNP calling from next-generation sequencing data. Nat. Rev. Genet. 12 , 
443-451 . 

Qcone, A., Haghverdi, L., Mueller, N.S., and Theis, F.J. (2015). Reconstructing 
gene regulatory dynamics from high-dimensional single-cell snapshot data. 
Bioinformatics 37, i89-i96. 

Qttolini, C.S., Newnham, L.J., Capalbo, A., Natesan, S.A., Joshi, H.A., 
Cimadomo, D., Griffin, D.K., Sage, K., Summers, M.C., Thornhill, A.R., et al. 
(2015). Genome-wide maps of recombination and chromosome segregation 
in human oocytes and embryos show selection for maternal recombination 
rates. Nat. Genet. 47, 727-735. 

Pan, X., Durrett, R.E., Zhu, H., Tanaka, Y., Li, Y., Zi, X., Marjani, S.L., 
Euskirchen, G., Ma, C., Lamotte, R.H., et al. (2013). Two methods for full- 
length RNA sequencing for low quantities of cells and single cells. Proc. 
Natl. Acad. Sci. USA 110 , 594-599. 

Patel, A.P., Tirosh, I., Trombetta, J.J., Shalek, A.K., Gillespie, S.M., Wakimoto, 
H., Cahill, D.P., Nahed, B.V., Curry, W.T., Martuza, R.L, et al. (2014). Single- 
cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. 
Science 344, 1396-1401. 



Cell 163, November 5, 2015 ©2015 Elsevier Inc. 809 




Cell 



Picelli, S., Bjorklund, A.K., Faridani, O.R., Sagasser, S., Winberg, G., and 
Sandberg, R. (2013). Smart-seq2 for sensitive fuii-iength transcriptome 
profiling in single cells. Nat. Methods 10, 1096-1098. 

Pollen, A.A., Nowakowski, T.J., Shuga, J., Wang, X., Leyrat, A.A., Lui, J.H., Li, 
N., Szpankowski, L., Fowler, B., Chen, P., et al. (2014). Low-coverage single- 
cell mRNA sequencing reveals cellular heterogeneity and activated signaling 
pathways in developing cerebral cortex. Nat. Biotechnol. 32, 1053-1058. 

Raj, A., Peskin, C.S., Tranchina, D., Vargas, D.Y., and Tyagi, S. (2006). Sto- 
chastic mRNA synthesis in mammalian cells. PLoS Biol. 4, e309. 

Raj, A., van den Bogaard, P., Rifkin, S.A., van Oudenaarden, A., and Tyagi, S. 
(2008). Imaging individual mRNA molecules using multiple singly labeled 
probes. Nat. Methods 5, 877-879. 

Ramskold, D., Luo, S., Wang, Y.-C., Li, R., Deng, Q., Faridani, O.R., Daniels, 
G.A., Khrebtukova, I., Loring, J.F., Laurent, L.C., et al. (2012). Full-length 
mRNA-Seq from single-cell levels of RNA and individual circulating tumor 
cells. Nat. Biotechnol. 30, 777-782. 

Raser, J.M., and O’Shea, E.K. (2004). Control of stochasticity in eukaryotic 
gene expression. Science 304, 1811-1814. 

Saliba, A.-E., Westermann, A.J., Gorski, S.A., and Vogel, J. (2014). Single-cell 
RNA-seq: advances and future challenges. Nucleic Acids Res. 42, 8845-8860. 
Sasagawa, Y., Nikaido, I., Hayashi, T., Danno, H., Uno, K.D., Imai, T., and 
Ueda, H.R. (2013). Quartz-Seq: a highly reproducible and sensitive single- 
cell RNA sequencing method, reveals non-genetic gene-expression heteroge- 
neity. Genome Biol. 14, R31. 

Shalek, A.K., Satija, R., Adiconis, X., Gertner, R.S., Gaublomme, J.T., 
Raychowdhury, R., Schwartz, S., Yosef, N., Malboeuf, C., Lu, D., et al. 
(2013). Single-cell transcriptomics reveals bimodality in expression and 
splicing in immune cells. Nature 498, 236-240. 

Shalek, A.K., Satija, R., Shuga, J., Trombetta, J.J., Gennert, D., Lu, D., Chen, 
P., Gertner, R.S., Gaublomme, J.T., Yosef, N., et al. (2014). Single-cell 
RNA-seq reveals dynamic paracrine control of cellular variation. Nature 510, 
363-369. 

Snijder, B., and Pelkmans, L. (2011). Origins of regulated cell-to-cell variability. 
Nat. Rev. Mol. Cell Biol. 12, 119-125. 

Tang, F., Barbacioru, C., Wang, Y., Nordman, E., Lee, C., Xu, N., Wang, X., 
Bodeau, J., Tuch, B.B., Siddiqui, A., et al. (2009). mRNA-Seq whole-transcrip- 
tome analysis of a single cell. Nat. Methods 6, 377-382. 

Tang, F., Barbacioru, C., Bao, S., Lee, C., Nordman, E., Wang, X., Lao, K., and 
Surani, M.A. (2010). Tracing the derivation of embryonic stem cells from the 
inner cell mass by single-cell RNA-Seq analysis. Cell Stem Cell 6, 468-478. 
Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, 
M.J., Salzberg, S.L, Wold, B.J., and Pachter, L. (2010). Transcript assembly 
and quantification by RNA-Seq reveals unannotated transcripts and isoform 
switching during cell differentiation. Nat. Biotechnol. 28, 511-515. 

Trapnell, C., Cacchiarelli, D., Grimsby, J., Pokharel, P., Li, S., Morse, M., Len- 
non, N.J., Livak, K.J., Mikkelsen, T.S., and Rinn, J.L. (2014). The dynamics and 



regulators of cell fate decisions are revealed by pseudotemporal ordering of 
single cells. Nat. Biotechnol. 32, 381-386. 

Treutlein, B., Brownfield, D.G., Wu, A.R., Neff, N.F., Mantalas, G.L., Espinoza, 

F. H., Desai, T.J., Krasnow, M.A., and Quake, S.R. (2014). Reconstructing line- 
age hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 
509, 371-375. 

Tsafrir, D., Tsafrir, I., Ein-Dor, L., Zuk, O., Notterman, D.A., and Domany, E. 
(2005). Sorting points into neighborhoods (SPIN): data analysis and visualiza- 
tion by ordering distance matrices. Bioinformatics 21, 2301-2308. 

Van der Aa, N., Cheng, J., Mateiu, L., Zamani Esteki, M., Kumar, P., Dimitria- 
dou, E., Vanneste, E., Moreau, Y., Vermeesch, J.R., and Voet, T. (2013). 
Genome-wide copy number profiling of single cells in S-phase reveals DNA- 
replication domains. Nucleic Acids Res. 41, e66. 

Van der Maaten, L., and Hinton, G. (2008). Visualizing Data using t-SNE. 
J. Mach. Learn. Res. 9, 2570-2605. 

Venkatraman, E.S., and Olshen, A.B. (2007). A faster circular binary segmen- 
tation algorithm for the analysis of array CGH data. Bioinformatics 23, 
657-663. 

Voet, T., Kumar, P., Van Loo, P., Cooke, S.L., Marshall, J., Lin, M.-L., Zamani 
Esteki, M., Van der Aa, N., Mateiu, L., McBride, D.J., et al. (2013). Single-cell 
paired-end genome sequencing reveals structural variation per cell cycle. 
Nucleic Acids Res. 41, 6119-6138. 

Wang, J., Fan, H.C., Behr, B., and Quake, S.R. (2012). Genome-wide single- 
cell analysis of recombination activity and de novo mutation rates in human 
sperm. Cell 150, 402-412. 

Wang, Y., Waters, J., Leung, M.L., Unruh, A., Roh, W., Shi, X., Chen, K., 
Scheet, P., Vattathil, S., Liang, H., et al. (2014). Clonal evolution in breast can- 
cer revealed by single nucleus genome sequencing. Nature 512, 155-160. 

Zeisel, A., Muhoz-Manchado, A.B., Codeluppi, S., Lonnerberg, P., La Manno, 

G. , Jureus, A., Marques, S., Munguba, H., He, L, Betsholtz, C., et al. (2015). 
Brain structure. Cell types in the mouse cortex and hippocampus revealed 
by single-cell RNA-seq. Science 347, 1138-1142. 

Zhang, L, Cui, X., Schmitt, K., Hubert, R., Navidi, W., and Arnheim, N. (1992). 
Whole genome amplification from a single cell: implications for genetic anal- 
ysis. Proc. Natl. Acad. Sol. USA 89, 5847-5851. 

Zhang, C., Zhang, C., Chen, S., Yin, X., Pan, X., Lin, G., Tan, Y., Tan, K., Xu, Z., 
Hu, P., et al. (2013). A single cell level based method for copy number variation 
analysis by low coverage massively parallel sequencing. PLoS ONE 8, e54236. 

Zhao, M., Wang, Q., Wang, Q., Jia, P., and Zhao, Z. (2013). Computational 
tools for copy number variation (CNV) detection using next-generation 
sequencing data: features and perspectives. BMC Bioinformatics 14 
{SuppI 7 7), SI. 

Zong, C., Lu, S., Chapman, A.R., and Xie, X.S. (2012). Genome-wide detection 
of single-nucleotide and copy-number variations of a single human cell. Sci- 
ence 338, 1622-1626. 



810 Cell 163, Novembers, 2015 ©2015 Elsevier Inc. 




Leading Edge 

Review 



Hippo Pathway in Organ Size Controi, 
Tissue Homeostasis, and Cancer 



Cell 



Fa-Xing Yu,^ * Bin Zhao,^ and Kun-Liang Guan^ * 

■'Children’s Hospital and Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China 

^Life Sciences Institute and Innovation Center for Cell Signaling Network, Zhejiang University, Hangzhou, Zhejiang 310058, China 
^Department of Pharmacology and Moores Cancer Center, University of California, San Diego, La Jolla, CA 92093, USA 
*Correspondence: fxyu@fudan.edu.cn (F.-X.Y.), kuguan@ucsd.edu (K.-L.G.) 
http://dx.doi.Org/1 0.1 01 6/j.cell.201 5. 1 0.044 



Two decades of studies in multiple model organisms have established the Hippo pathway as a key 
regulator of organ size and tissue homeostasis. By inhibiting YAP and TAZ transcription co-activa- 
tors, the Hippo pathway regulates cell proliferation, apoptosis, and sternness in response to a wide 
range of extracellular and intracellular signals, including cell-cell contact, cell polarity, mechanical 
cues, ligands of G-protein-coupled receptors, and cellular energy status. Dysregulation of the 
Hippo pathway exerts a significant impact on cancer development. Further investigation of the 
functions and regulatory mechanisms of this pathway will help uncovering the mystery of organ 
size control and identify new targets for cancer treatment. 



The emergence of multicellular organisms is an evolutionary 
milestone. Among the most fundamental mechanisms support- 
ing multicellularity are those that ensure the proper size and 
shape of tissues and organs to meet the need of functionality. 
However, despite intensive investigations into the underlying 
principles behind a “preset” size of organs, we are far from hav- 
ing a clear picture of this basic question in developmental 
biology. Nevertheless, investigations of the Hippo pathway on 
organ size control have shed light into this mystery (Haider and 
Johnson, 2011; Pan, 2010; Yu and Guan, 2013). 

In 1995, two studies in Drosophila discovered that deletion of 
Warts (wts) gene resulted in dramatic overgrowth of multiple tis- 
sues (Justice et al., 1995; Xu et al., 1995). Several years later, a 
flurry of studies showed that Salvador (sav) (Kango-Singh 
et al., 2002; Tapon et al., 2002), Hippo (hpo) (Harvey et al., 
2003; Jia et al., 2003; Pantalacci et al., 2003; Udan et al., 2003; 
Wu et al., 2003), and Mob as tumor suppressor (mats) (Lai 
et al., 2005) mutant Drosophila phenocopy wts mutants with re- 
gards to tissue overgrowth. Hpo, Sav, Wts, and Mats interact 
genetically and physically, and the remarkable organ size pheno- 
type elicited by their mutation is unprecedented in other estab- 
lished developmental pathways; thus, they were grouped into 
a new signaling module— the Hippo pathway— named after the 
enormous size of hpo mutant organs, which resembles that of 
a hippopotamus. Yorkie (yki), the key functional effector of the 
Hippo pathway in organ size regulation, was soon discovered 
in a screen for Wts-interacting proteins (Huang et al., 2005). 

The Hippo pathway is highly conserved in mammals. The 
mammalian orthologs of Hpo, Sav, Wts, Mats, and Yki are 
Mammalian sterile 20-like 1/2 (MST1/2, also called STK4/3), Sal- 
vador (S AVI), Large tumor suppressor homolog 1/2 (LATS1/2), 
MOB kinase activator 1A/B (MOBIa/b), and Yes-associated pro- 
tein (YAP)/transcriptional co-activator with PDZ binding motif 
(TAZ, also called WWTR1), respectively (Haider and Johnson, 
2011; Pan, 2010; Yu and Guan, 2013) (Table 1). The Hippo 



pathway quickly attracted broad attention due to its remarkable 
potency in regulating organ size, as well as its apparent rele- 
vance to tissue regeneration and cancer. Here, with an emphasis 
on recent developments, we review the current understanding of 
the organization, regulation, and function of the Hippo pathway 
and discuss some key open questions. 

YAP/TAZ and Yki as Major Effectors of Hippo Signaling 

In Drosophila, deleting yki suppresses the overgrowth pheno- 
types of hpo, sav, or wts mutants (Huang et al., 2005). In 
mice, deleting Yap also diminishes the overgrowth phenotypes 
caused by deficiency of Mst1/2 or other upstream regulators 
(Zhang et al., 2010; Zhou et al., 2011). Thus, Yki and YAP/TAZ 
are the evolutionarily conserved key effectors of the Hippo 
pathway. 

Yki and YAP/TAZ are believed to mediate the biological func- 
tions of the Hippo pathway by regulating gene transcription. As 
transcriptional co-activators, Yki and YAP/TAZ cannot bind 
DNA directly, and they must interact with DNA-binding transcrip- 
tion factors to regulate target gene expression. In Drosophila, 
Scalloped (Sd) is a transcription factor and key partner of Yki 
that mediates the function of Yki in tissue growth (Goulev 
et al., 2008; Wu et al., 2008; Zhang et al., 2008; Zhao et al., 
2008). Similarly, in mammalian cells, the TEAD family transcrip- 
tion factors (TEAD1-4, orthologs of Sd) are key partners for 
YAP (Zhao et al., 2008). Several lines of evidence indicate that 
TEADs are the major partners of YAP in transcriptional regula- 
tion. For instance, a TEAD-binding-deficient YAP mutant lost 
its ability to induce transcription of known YAP target genes 
(Zhao et al., 2008), and knockin of the TEAD-binding-deficient 
YAP (Y94A mutant) phenotypically mimics YAP knockout in the 
skin and heart (Schlegelmilch et al., 2011; von Gise et al., 
2012). In addition, the majority of YAP and TEAD occupied sites 
in the genome are shared (Stein et al., 2015; Zanconato et al., 
2015; Zhao et al., 2008), and when TEAD is fused with a VP16 
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Table 1. Hippo Pathway Components and Major Functions 


Drosophila 


Mammals 


Major Functions in Hippo Pathway 


Flippo (Flpo) 


MST1/2 


Phosphorylates LATS1/2, MOB1, and SAV1, leading to LATS1/2 activation 


Salvador (Sav) 


SAV1 


Interacts with MST1/2, promotes phosphorylation of LATS1/2 by MST1/2 


Warts (Wts) 


LATS1/2 


Phosphorylates and inactivates YAP/TAZ 


Mats 


MOB1A/B 


Scaffold protein of LATS1/2 


Yorkie (Yki) 


YAP/TAZ 


Transcription co-activator, major effector of the Hippo pathway 


Scalloped (Sd) 


TEAD1-4 


Transcription factors that mediate the effect of YAP/TAZ 


Tgi 


VGLL4 


Competes with YAP/TAZ for TEADs, inhibits YAP/TAZ functions 


misshapen (Msn) 


MAP4K4/6/7 


Phosphorylates and activates LATS1/2 


Merlin (Mer) 


Merlin/NF2 


May form a complex and mediates upstream signals (from plasma membrane) to MST1/2; 


Kibra 


KIBRA 


NF2 may bring LATS1/2 to plasma membrane and facilitate its activation by MST1/2 


Expanded (Ex) 


FRMD6? 






AMOT 


Sequesters YAP/TAZ to cell junctions, binding and indirectly activating LATS1/2; 
a substrate of LATS1/2 



transactivation domain, it produces a gene expression profile 
similar to that induced by active YAP (Ota and Sasaki, 2008). 
TEAD1-4 or Sd can bind to a consensus motif similar to 
the GTIIC sequence (TGGAATGT or ACATTCCA), and transcrip- 
tion reporters under control of GTIIC concatemers are now 
widely used to measure Hippo pathway activity (Dupont et al., 
2011; Mohseni et al., 2014; Ota and Sasaki, 2008). Besides 
TEAD1-4, YAPATAZ have also been shown to interact with other 
transcription factors, including Smad, RUNX1/2, p63/p73, and 
ErbB4 (reviewed in Varelas, 2014), although the functional signif- 
icance of these transcription factors in Hippo pathway is less 
clear. 

A strong transcriptional activation domain is present in YAP/ 
TAZ but absent in Yki. Nevertheless, human YAP can rescue 
the lethality resulting from Hippo pathway hyperactivation in 
Drosophila, indicating a functional conservation (Huang et al., 
2005). Genome-wide assessment of chromatin binding status 
reveals that, in addition to occupancy at proximal promoters of 
target genes, YAP and TEAD largely exert their transcriptional 
activity by interacting with distal enhancers, suggesting that 
YAP, and probably TAZ and Yki, may regulate transcription via 
multiple mechanisms, including recruitment of general transcrip- 
tion factors, modification of epigenetic markers, and modulation 
of chromatin structure (Lian et al., 2010; Stein et al., 2015; Zan- 
conato et al., 2015) (Figure 1). Indeed, recent evidence shows 
that Yki can interact with the Brahma complex, GAGA factors, 
nuclear receptor coactivator 6 (NCoA6, a subunit of the Tri- 
thorax-related histone methyltransferase), and the Mediator 
complex (Jin et al., 2013; Oh et al., 2013; Qing et al., 2014), 
and TAZ can interact with the chromatin-remodeling complex 
SWI/SNF (Skibinski et al., 2014). 

The transcriptional activity of Yki and YAPATAZ is regulated in 
the nucleus by Tondu-domain-containing growth inhibitor (Tgi) 
and Vestigial-like family member 4 (VGLL4, an ortholog of Tgi). 
Tgi can directly compete with Yki for Sd binding, resulting in in- 
hibition of Yki-regulated transcription (Guo et al., 2013; Koontz 
et al., 201 3). When Hippo signaling is on, Tgi and Sd form a com- 
plex, leading to transcriptional repression; on the contrary, when 
Hippo signaling is off, Yki enters the nucleus and displaces Tgi 



from Sd, leading to expression of Yki target genes and tissue 
growth (Koontz et al., 2013). In mammals, VGLL4 similarly com- 
petes with YAPATAZ for TEAD binding (Jiao et al., 2014; Zhang 
et al., 2014b) (Figure 1). However, whether YAP/TAZ functions 
simply by relieving a default repression of TEAD by VGLL4 has 
not been demonstrated. Interestingly, the expression of VGLL4 
appears to be repressed by miR-130a— a microRNA (miRNA) 
that is directly induced by YAP, leading to amplification of YAP 
activity (Shen et al., 2015). A similar mechanism is also present 
in Drosophila, in which Bantam— a well-known Yki-induced 
miRNA (Nolo et al., 2006; Thompson and Cohen, 2006), can 
repress expression of Tgi (Shen et al., 2015). 

Although their role as transcriptional co-activators is widely 
appreciated, YAP/TAZ and Yki may also repress expression of 
certain genes when bound to specific factors. For instance, 
YAP/TAZ can interact with the nucleosome-remodeling and his- 
tone deacetylase (NuRD) complex, resulting in transcriptional 
repression (Kim et al., 2015). Moreover, YAP has been identified 
as a regulator for global miRNA biogenesis via modulation of 
miRNA-processing enzymes Microprocessor or Dicer complex, 
suggesting a transcription-independent role for YAP (Chaulk 
et al., 2014; Mori et al., 2014). Hence, YAP/T/\Z and Yki may 
regulate gene expression via multiple mechanisms. 

Gene expression signatures under YAP/TAZ and Yki hyperac- 
tivation (or ectopic expression) in Drosophila and different 
mammalian cell types have been profiled by independent 
studies. However, the overlap between these different gene 
profiling studies is not high, suggesting that YAP/TAZ and Yki 
may regulate target gene expression in a tissue- or cell-type- 
specific manner. In Drosophila, some common Yki target genes 
are Bantam, diap1 , and cyclin E, which may mediate the effect of 
Yki in inhibiting cell death and promoting cell proliferation (Huang 
et al., 2005; Nolo et al., 2006; Tapon et al., 2002; Thompson and 
Cohen, 2006). In mammals, connective tissue growth factor 
(CTGF) is a commonly used marker of YAP activation (Zhao 
et al., 2008), and many genes involved in cell proliferation, cell 
adhesion, and cell migration have also been identified as YAP 
targets (Stein et al., 2015). It is likely that a panel of genes work 
together to exert biological functions of YAP/T/\Z. 
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Figure 1. Inhibition of YAP/TAZ Transcrip- 
tional Coactivators by LATS1/2 

(Left) When Hippo signaling is off, YAP/TAZ enter 
the nucleus, compete with VGLL4 for TEADs, and 
recruit other factors to induce gene transcription. 
YAP/TAZ may bind proximal promoters or distal 
enhancers of target genes to induce transcription. 
(Right) When Hippo signaling is on, YAP/TAZ are 
phosphorylated by LATS1/2 on multiple sites, re- 
sulting in interaction with 14-3-3 and cytoplasmic 
retention; phosphorylation also leads to YAP/TAZ 
poly-ubiquitination and degradation. VGLL4 in- 
teracts with TEADs and represses target gene 
transcription. 




MST1/2 and LATS1/2 Kinases Restrict YAP Activity 

In the Hippo pathway, LATS1/2 directly phosphorylate and 
inhibit YAPATAZ. Interestingly, YAP has five LATS1/2 target 
consensus motifs (HXRXXS), four of which are conserved in 
TAZ (Zhao et al., 2010). Phosphorylation of YAP on serine 127 
(SI 27) generates a 14-3-3 binding site, and binding with 
14-3-3 sequesters YAP in the cytoplasm (Dong et al., 2007; 
Zhao et al., 2007). In addition, phosphorylation of YAP on serine 
381 (S381) triggers a subsequent phosphorylation by casein ki- 
nase 1 (CK1 6/e) and activation of a phosphodegron, resulting 
in recruitment of E3 ligase, ubiquitination, and pro- 

teasomal degradation of YAP (Zhao et al., 2010). Thus, through 
regulating both YAP subcellular localization and protein stability, 
LATS1/2 ensures a spatial and temporal control of YAP activity 
(Figure 1). TAZ is regulated by l_ATS1/2 in a similar fashion, 
although degradation plays a more prominent role in TAZ 
regulation possibly due to an additional phosphodegron at its 
N terminus (Lei et al., 2008; Liu et al., 2010a). The subcellular 
localization of Yki is regulated similarly by Wts phosphorylation 
and 1 4-3-3 binding. However, the phosphodegron and the phos- 
phorylation-mediated degradation mechanisms are not 
conserved in Yki (Huang et al., 2005; Oh and Irvine, 2008). In 
addition, YAP and Yki have also been shown to be degraded 
via the autolysosomal pathway (Kwon et al., 2013; Liang et al., 
2014), suggesting a potential role of YAP and Yki in vesicular 
membrane dynamics and related cellular processes such as 
autophagy. 

LATS1/2 are activated by MST1/2 through several mecha- 
nisms. MST1/2 phosphorylate LATS1/2 at the C-terminal hy- 
drophobic motif, which promotes LATS1/2 autophosphoryla- 
tion at its activation loop. Furthermore, phosphorylation of 
MOB1 by MST1/2 enhances MOB1 interaction with the autoin- 
hibitory domain of LATS1/2, leading to full activation of LATS1/ 
2 (Callus et al., 2006; Chan et al., 2005; Praskova et al., 2004; 
Wu et al., 2003). In addition, SAV1 is also phosphorylated by 
MST1/2, and SAV1 functions as a partner of MST1/2 in pro- 
moting LATS1/2 phosphorylation (Callus et al., 2006; Tapon 



et al., 2002) (Figure 1). In Drosophila, 
Wts is regulated by Hpo, Sav, and 
Mats by a similar mechanism (Wehr 
et al., 2013). 

Merlin (Mer), Expanded (Ex), and Kibra, 
three cell-cortex-localized and cytoskel- 
eton-interacting proteins, may function 
as a scaffold for core Hippo components at the apical domain 
for activation, as Sav, Hpo, and Wts have been shown to physi- 
cally interact with Ex/Mer/Kibra (Baumgartner et al., 2010; Gen- 
evet et al., 2010; Hamaratoglu et al., 2006; Yu et al., 2010) (Table 
1). In addition, the effect of Ex/Mer/Kibra on the Hippo pathway 
may be mediated by Tao kinase 1 (Tao-1), which can directly 
phosphorylate and activate Hpo (Boggiano et al., 2011; Poon 
et al., 2011). In mammalian cells, Neurofibromin 2 (NF2, Mer or- 
tholog) appears to play a more direct role in regulating LATS1/2 
activity: it can directly interact with LATS1/2 and recruit LATS1/2 
to the plasma membrane for activation by MST1/2 (Yin et al., 
2013; Zhang et al., 2010) (Figures 2 and 3). 

Hpo or MST1/2 are not absolutely required for regulation of 
Wts or LATS1/2. It has been observed that, in mouse embryonic 
fibroblast (MEF) cells, MST1/2 double knockout did not abolish 
YAP phosphorylation, suggesting the existence of additional 
Hippo-like activity (Zhou et al., 2009). Indeed, a recent study 
in Drosophila has identified Misshapen (Msn) as another 
kinase responsible for Wts activation. This mechanism is also 
conserved in mammals, as MAP4K4 (Msn ortholog) overexpres- 
sion promotes phosphorylation of LATS1/2 (Li et al., 2014), and 
MAP4K4 knockdown induces activity of a YAP reporter (Moh- 
seni et al., 2014). In addition to MAP4K4, two recent studies re- 
vealed that many MAP4K family kinases, including MAP4K1/2/ 
3/5 (Happyhour in Drosophila) and MAP4K4/6/7 (Msn in 
Drosophila), can directly phosphorylate and activate LATS1/2 
(Meng et al., 2015; Zheng et al., 2015). These kinases, together 
with MST1/2, may regulate LATS1/2 activity in a tissue- and 
signal-specific manner. It is also possible that additional ki- 
nases, especially some STE20 family members, may activate 
LATS1/2 in response to different upstream signals or in different 
tissue contexts (Figures 2 and 3). 

YAP/T/\Z have also been shown to be phosphorylated by 
many other kinases such as cyclin-dependent kinase 1 
(CDK1), Jun N-terminal kinases (JNK), homeodomain-interact- 
ing protein kinases (HIPK), ABL, and Src family tyrosine kinases 
(reviewed in Varelas, 2014), suggesting that YAP/T/\Z can be 
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Figure 2. Regulation of the Hippo Pathway 
in Drosophila 

The Hippo pathway is reguiated by ceii-adhesion 
moiecuies (Ed), determinants of ceii poiarity (Crb, 
Fat/Ds, Scrib compiex), and mechanicai cues 
(spectrin, F-actin, or ceiiuiar tension), in addition, 
Hpo is reguiated by Tao, sait-induced kinase (Sik), 
Ras-associated factor (Rassf), striatin-interacting 
phosphatase and kinase (STRiPAK) compiex; Wts 
is reguiated by Zyxin (Zyx) and Jub; Yki is regu- 
iated by WW-domain-binding protein 2 (Wbp2), 
Hipk, and muitipie ankyrin repeats singie-KH 
domain (Mask) (refer to Vareias, 2014). Arrows, 
biunt ends, and dashed iines indicate activation, 
inhibition, and indirect reguiation, respectiveiy. 




regulated by mechanisms independent of Hippo pathway ki- 
nases. 

Cell Polarity and Cell Adhesion Regulate Hippo Signaling 

In searching for upstream regulators of the Hippo pathway, 
many proteins involved in cell polarity and cell adhesion have 
been identified. Echinoid (Ed), a cell adhesion molecule in 
Drosophila, can interact with and stabilize Sav and leads to acti- 
vation of Hpo (Yue et al., 2012). In mammalian cells, several pro- 
teins at adherens and tight junctions, such as Angiomotin 
(AMOT), protein tyrosine phosphatase non-receptor type 14 
(PTPN14), and a-catenin, can sequester YAP/TAZ at cell junc- 
tions (reviewed in Yu and Guan, 2013). Therefore, cell adhesion 
and formation of intercellular junctions serve as a mechanism to 
repress YAPATAZ transcriptional activity (Figures 2 and 3). 

Crumbs (Crb), a component of apical-basal polarity, interacts 
with Ex, which is critical for the apical localization of Ex/Mer/ 
Kibra complex (Chen et al., 2010; Ling et al., 2010; Robinson 
et al., 2010). In addition. Scribble (SCRIB) interacts with both 
MST1/2 and LATS1/2, thus promoting LATS1/2 activation (Cor- 
denonsi et al., 2011; Mohseni et al., 2014). Fat, a protocadherin 
that plays a key role in planar cell polarity, also activates the 
Hippo pathway, possibly through regulating Ex protein levels 
and localization (Bennett and Harvey, 2006; Silva et al., 2006; 
Tyler and Baker, 2007; Willecke et al., 2006) (Figure 2). However, 
mammalian Fat orthologs do not seem to be major regulators of 
the Hippo pathway (Sharma and McNeill, 2013). It is noteworthy 
that the link between cell polarity and the Hippo pathway may be 
indirect, and some proteins, such as Fat, may regulate cell po- 
larity and Hippo signaling via different mechanisms (Matakatsu 
and Blair, 2012). 



Cell Contact and Mechanical Cues 
Regulate Hippo Signaling 

Cells in solid tissues communicate with 
neighboring cells and their extracellular 
matrix (ECM) and perceive constant 
physical signals from their local environ- 
ment. Cell-cell contact was discovered 
as the first signal regulating the Hippo 
pathway (Zhao et al., 2007). In a sparse 
culture, YAPATAZ are primarily localized 
in the nucleus to promote target gene 
transcription and cell proliferation. On 
the contrary, at high cell density, YAPATAZ are primarily cyto- 
plasmic, corresponding to growth inhibition. It is known that a 
cell ceases to proliferate following physical contact with sur- 
rounding cells, and loss of cell contact inhibition is an indicator 
of oncogenic transformation. Thus, regulation of YAPATAZ by 
cell density suggests a critical role for the Hippo pathway in con- 
tact inhibition, tissue growth, and tumorigenesis. 

Mechanical cues, such as ECM stiffness and cell geometry, 
are also potent regulators of YAP/TAZ (Dupont et al., 2011) 
(Figure 3). When cells are grown on a stiff matrix or are spread 
across a large surface, YAP/TAZ are activated. In contrast, 
when cells are seeded on a soft matrix or are compressed into 
a small area, YAPATAZ are inactivated. ECM stiffness and cell 
geometry are important for cell proliferation and differentiation, 
and YAPATAZ activity plays a role in these cellular processes. 
Cell geometry has also been proposed as a mechanism underly- 
ing Hippo pathway regulation by cell density: at low density, cells 
are flat and spread, leading to YAP activation; whereas at high 
density, cells adopt a round and compact geometry, resulting 
in YAP inactivation (Aragona et al., 2013; Wada et al., 2011). 
As further support for a role of YAPATAZ in mechanosensing, me- 
chanical strain and shear stress have been shown to stimulate 
YAPATAZ, and YAP activation is required for mechanical strain- 
induced cell-cycle reentry (Benham-Pyle et al., 2015; Codelia 
et al., 2014; Kim et al., 2014). In a medaka fish model, YAP 
also mediates changes in three-dimensional body shape in 
response to tissue tension (Porazinski et al., 2015). 

The attachment of cells to the ECM is an important mechanism 
for maintaining cell survival, and cells normally undergo anoikis 
when detached from the ECM. The attachment status of a cell 
to its ECM can also regulate Hippo pathway activity. YAP is 
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Figure 3. Regulation of the Hippo Pathway 
in Mammals 

The Hippo pathway is reguiated by diverse sig- 
nais: (1) determinants of ceii poiarity and ceii-ceii 
junctions, such as SCRiB, which interacts with 
MST1/2 and LATS1/2, AMOT, PTPN14, and 
a-catenin, which can sequester YAP/TAZ to ceii 
junctions; (2) mechanicai cues, such as stiffness, 
ceii contact, ceii geometry, and ceii attachment 
status that reguiate the Hippo pathway by modu- 
iating activity of Rho GTPases, remodeiing the 
actin cytoskeieton, or aitering ceiiuiar tension; 
both apicai and basoiaterai spectrin networks may 
function as sensors for mechanicai cues in Hippo 
pathway reguiation; (3) soiubie factors, especiaiiy 
iigands for GPCRs, reguiate l_ATS1/2 iikeiy 
through Rho GTPases and actin dynamics; (4) 
metaboiic status, such as ceiiuiar energy and 
oxygen stress, aiso reguiate Hippo signaiing; 
many other proteins such as protein phosphatase 
2A (PP2A), protein phosphatase 1 (PP1), WBP2, 
CDK1, MASK, and HiPK can aiso reguiate activ- 
ities of different Hippo pathway components (refer 
to Vareias, 2014). Arrows, biunt ends, and dashed 
iines indicate activation, inhibition, and indirect 
reguiation, respectiveiy. 



activated during the process of cell attachment but inactivated 
when cells are detached (Zhao et al., 2012). Expression of consti- 
tutively active YAP promotes survival of detached cells, suggest- 
ing that cancer cells with high YAP activity may escape anoikis 
and undergo metastasis. In support of this notion, l_ATS1/2 
expression was found to be selectively reduced in metastatic, 
but not primary, prostate tumors (Zhao et al., 2012). 

Soluble Factors Regulate Hippo Signaling 

The primary function of YAP/TAZ is to promote growth, and many 
mitogenic hormones and growth factors act through G-protein- 
coupled receptors (GPCR) to induce cell proliferation (Figure 3). Li- 
gands signal through GPCRs coupled to Gai 2 /is or Gaq/n, such 
as lysophosphatidic acid, thrombin, angiotensin II, and estrogen, 
and can activate YAP/TAZ; in contrast, ligands signal through 
Gag-coupled GPCRs and protein kinase A (PKA), such as 
epinephrine and glucagon, can repress YAP/TAZ activity (Kim 
et al., 2013; Miller et al., 2012; Mo et al., 2012; Wennmann et al., 
2014; Yu et al., 2013a; Yu et al., 2012; Zhou et al., 2015). Interest- 
ingly, activation of protein kinase C (PKC) by Gaq/i i can either acti- 
vate or inhibit YAP, with conventional PKC activating YAP and 
novel PKC inhibiting YAP (Gong et al., 2015). The remarkable dif- 
ferential functions of PKC in YAP regulation provide a mechanism 
to explain some of the cell-type-specific responses to PKC activa- 
tion. GPCR is the largest family of membrane receptors mediating 
diverse physiological or pathological responses. The demonstra- 
tion of Hippo regulation by GPCRs links the Hippo pathway to a 
wide range of upstream signals and biological functions. 

The Wnt/p-catenin pathway is a key signaling cascade in 
development and carcinogenesis. A destruction complex includ- 
ing Axin, adenomatous polyposis coli (APC), and glycogen 
synthase kinase-3 (GSK3) causes constant degradation of p-cat- 
enin. Wnt stimulation disrupts the destruction complex and leads 
to accumulation of p-catenin. Interestingly, recent studies sug- 



gest that YAPATAZ are also activated by diverse Wnt family li- 
gands. YAP/TAZ have been shown to be components of the 
destruction complex and are regulated by Wnt in a fashion similar 
to that of p-catenin (Azzolin et al., 201 2, 201 4). However, a recent 
study suggests that Wnt activates YAP/TAZ via Frizzled (a GPCR- 
like Wnt receptor), Gai 2 / 13 , Rho GTPases, and LATS1/2 (Park 
et al., 201 5). In addition, APC has been shown to act as a scaffold 
protein for SAV1 and LATS1 , and Ape deletion leads to YAP acti- 
vation and tumorigenesis (Cai et al., 2015). More studies may be 
needed to verify the mechanism of YAPATAZ activation by Wnt. 

Epidermal growth factor (EGF) and insulin have also been 
shown to regulate YAP/Yki activity in cultured mammalian cells 
and Drosophila, and these signals are mediated by the Ras-Raf- 
MAPK signaling cascade or phosphoinositide-dependent kinase 
(PDK1) (Fan et al., 2013; Reddy and Irvine, 2013; StraBburger 
et al., 2012). TAZ is also stabilized upon PI3K activation, which 
is mediated by direct phosphorylation by GSK3 (Huang et al., 
2012). However, no significant effect of EGF and IGF on YAP 
was observed in several other studies, and YAP activity appeared 
to be normal in the presence of inhibitors of PI3K or AKT or in PDK1 
null embryonic cells (Yu et al., 2012; Zhao et al., 2007). These dis- 
crepancies could be due to differences in experimental settings or 
cell types and should be clarified by future studies. 

Effect of Cellular Metabolic Status on Hippo Signaling 

Recently, a link between cellular metabolic status and the 
Hippo pathway has been reported (Figure 3). Under energy 
deprivation, such as glucose starvation, the AMPK-activated 
protein kinase (AMPK) can directly phosphorylate YAP at serine 
61 (S61) and serine 94 (S94) (Mo et al., 2015; Wang et al., 2015). 
Phosphorylation of YAP S94 abolishes the interaction between 
YAP and TEADs, leading to inhibition of YAP activity. In addition, 
energy stress also inhibits YAP by increasing kinase activity 
of LATS1/2 either in an AMPK-dependent or -independent 
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manner (Mo et al., 2015; Wang et al., 2015). AMPK can phos- 
phorylate AMOTL1 , which in turn facilitates YAP phosphorylation 
by LATS1/2 (DeRan et al., 2014). A similar mechanism is also 
present in Drosophila, in which Ampk inactivates Yki and affects 
cell proliferation in the larval central brain and central nerve cord 
(Gailite et al., 2015). It appears that, during energy stress, both 
AMPK and LATS1/2 are unleashed to restrict YAP activity. These 
findings also suggest that the Hippo pathway may mediate the 
anticancer effect of metformin, which is known to activate 
AMPK (DeRan et al., 2014; Mo et al., 2015). Other than AMPK, 
glucose may also promote YAP/TAZ activity through phospho- 
fructokinase, which stimulates the interaction between YAP/ 
TAZ and TEADs (Enzo et al., 2015). 

YAP/TAZ activity has also been linked to oxygen availability. 
Under hypoxic conditions, the hypoxia-inducible factor 1 (HIF1) 
stimulates expression of SIAH1/2, two E3 ubiquitin ligases. 
SIAH1/2 then promote ubiquitination and degradation of 
I-ATS2, leading to YAP/TAZ activation (Ma et al., 2015; Xiang 
et al., 2014) (Figure 3). In addition, HIF1 directly induces tran- 
scription of TAZ (Xiang et al., 2014), and YAP interacts with 
and stabilizes HIF1 to enhance transcription of HIF1 target genes 
(Ma et al., 2015). 

YAP/TAZ are potent stimulators of cell growth and pro- 
liferation, which are energy-consuming processes. The regu- 
lation of Hippo signaling by AMPK suggests that metabolic 
status can function as a checkpoint for growth-promoting 
activity of YAP/TAZ. Under conditions of nutrient deprivation 
or energy crisis, YAP/TAZ activity needs to be restricted to 
prevent energy exhaustion caused by anabolic processes. 
Oxygen also plays a critical role in cellular metabolism, and hyp- 
oxia is involved in different pathological processes such as can- 
cer. The link between hypoxia and YAP/T/\Z activity indicates a 
role of YAP/T/\Z in mediating oncogenic effect of hypoxia. 

Actin Cytoskeleton Integrates Upstream Signals 

The actin cytoskeleton and Rho GTPases are not only important 
in maintaining cell morphology, but also play important roles in 
regulating cell proliferation and differentiation (Jaffe and Hall, 
2005). Manipulation of the actin cytoskeleton, such as overex- 
pression of Rho GTPases or inhibition of Rho by C3 toxin, dramat- 
ically modulates YAP/T/\Z activity (Dupont et al., 201 1 ; Yu et al., 
201 2; Zhao et al., 201 2). Rho GTPases and changes in actin cyto- 
skeleton dynamics have been demonstrated to be key mediators 
of mechanical cues, GPCR ligands, and cell attachment in regu- 
lating the Hippo pathway (Figure 3). Consistently, deletion of 
different regulators of the actin cytoskeleton impacts YAP and 
Yki activity. For instance, loss of actin-capping proteins or the 
Capulet gene (which inhibits actin polymerization) in Drosophila 
results in Yki activation and tissue overgrowth (Fernandez et al., 
2011; Sansores-Garcia et al., 2011). Similarly, knockdown of 
actin-capping proteins or filamentous actin (F-actin)-severing 
proteins (cofilin or gelsolin) in mammalian cells also leads to 
YAP activation (Aragona et al., 201 3). In general, Rho GTPase ac- 
tivity and F-actin appear to activate YAP/T/\Z, whereas destabi- 
lization of F-actin inhibits YAP/TAZ (Dupont et al., 2011; Miller 
et al., 201 2; Wada et al., 201 1 ; Yu et al., 201 2; Zhao et al., 201 2). 

Spectrin proteins, in association with short actin filaments, are 
organized into an elastic polygonal meshwork that lines the intra- 



cellular side of the plasma membrane. In epithelial cells, localiza- 
tion of the spectrin network is usually polarized and present in 
both apical and basolateral domains. In addition to a supporting 
role for cell structure, the spectrin network may transmit diverse 
signals from cell microenvironment to regulate cellular functions 
(Bennett and Gilligan, 1993). Recently, three independent 
studies revealed a regulatory role of the spectrin network on 
the Hippo pathway, as disruption of the spectrin network in mul- 
tiple Drosophila tissues leads to activation of Yki and tissue 
outgrowth (Deng et al., 2015; Fletcher et al., 2015; Wong et al., 
2015). A similar phenomenon was also observed in mammalian 
cells to support a role of spectrin in YAP regulation (Figures 2 
and 3). Therefore, spectrins or associated actin filaments may 
function as a major node for integrating upstream signals, such 
as mechanical cues. 

Despite its apparent importance, it remains unclear how the 
actin cytoskeleton regulates activity of Hippo pathway core ki- 
nases. One possibility is that multiple Hippo pathway compo- 
nents are enriched at the apical domain via an actin-mediated 
mechanism that facilitates signal transduction, and actin remod- 
eling may reinforce or disrupt the clustering of Hippo pathway 
components. In another scenario, the Hippo pathway could 
be regulated by contractile actomyosin and cellular tension 
(Figure 3). Inhibition of tension-related enzymes, such as non- 
muscle myosin (by Blebbistatin), Rho kinases (ROCK, by 
Y27632), and myosin light-chain kinase (by ML-7), results in 
YAP/TAZ inhibition (Dupont et al., 201 1 ; Wada et al., 201 1). How- 
ever, how tension is sensed by the Hippo pathway kinases is not 
fully understood. Treating cells with small molecules to inhibit 
cellular tension may also affect actin dynamics, so it is difficult 
to separate the effects of actin remodeling and tension gener- 
ated by actomyosin. However, it has been proposed that, in 
Drosophila, Ajuba (Jub) plays a critical role in tension-induced 
Yki regulation. In response to high tension, Jub recruits Wts to 
intercellular junctions via interactions with a-catenin, thereby in- 
hibiting Wts activity (Rauskolb et al., 201 4). In addition, JNK acti- 
vation upon mechanical strain has also been shown to repress 
LATS1 (Codelia et al., 2014). 

Although it has been initially reported that YAP regulation by 
mechanical signals is independent of l_ATS1/2, recent studies 
suggest that l_ATS1/2 are involved in YAP/TAZ regulation by 
the actin cytoskeleton (Miller et al., 2012; Wada et al., 201 1 ; Yu 
et al., 2012; Zhao et al., 2007, 2012). This discrepancy may be 
due to incomplete LATS1/2 depletion in knockdown experi- 
ments reported in earlier studies. Supporting a role for 
I-ATS1/2, the phosphorylation status and in vitro kinase activity 
of LATS1/2 are clearly regulated upon actin cytoskeleton rear- 
rangement, and the kinetics are similar to that of YAP/TAZ phos- 
phorylation (Yu et al., 2012; Zhao et al., 2012). The regulation of 
Yki/YAP/TAZ by spectrins or cell attachment/detachment has 
been shown to be dependent on Wts and LATS1/2 (Deng 
et al., 2015; Fletcher et al., 2015; Zhao et al., 2012). Moreover, 
during the preimplantation stage of mouse embryo develop- 
ment, l_ATS1/2 are essential for YAP regulation by small 
molecules targeting the actin cytoskeleton (Kono et al., 2014). 
Collectively, mechanical signals most likely act through Wts 
and LATS1/2 to influence the activity and function of Yki and 
YAP/TAZ. 
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Negative Feedback Regulation and Crosstalk 

High Yki or YAP/TAZ activity, especially over the long term, re- 
sults in tissue overgrowth or cancer (see below). Thus, the phys- 
iological fluctuation of Hippo signaling must be tightly controlled 
to avoid detrimental effects. In Drosophila, regulation of the 
Hippo pathway is fine-tuned by a built-in negative-feedback 
loop in which activation of Yki turns on the expression of up- 
stream regulators, including Four-jointed, Ex, Mer, Kibra, and 
Wts (Cho et al., 2006; Genevet et al., 2010; Hamaratoglu et al., 
2006; Jukam et al., 2013). Consistently, a similar negative feed- 
back mechanism also exists in mammalian cells. YAPATAZ 
directly induce the transcription of NF2, LATS2, and probably 
MST1, leading to LATS1/2 activation and YAPATAZ inhibition 
(Chen et al., 2015b; Dai et al., 2015; Moroishi et al., 2015b). In 
addition, YAPATAZ induce expression of angiomotin-like protein 
2 (AMOTL2), a negative regulator of YAP (Mohseni et al., 2014; 
Zhao et al., 2008). This negative feedback loop is critical for 
maintaining the proper transient activation of YAPATAZ upon 
stimulation. By these mechanisms, YAP and TAZ antagonize 
each other and may provide a buffer for fluctuations in Hippo 
pathway activity to ensure tissue homeostasis. When this nega- 
tive feedback is disrupted, dysregulation of the Hippo pathway 
may lead to tumorigenesis. 

Elucidating the Hippo pathway regulation and function is 
complicated by crosstalk between the Hippo pathway and 
many other developmental signaling pathways (reviewed in Var- 
elas, 2014). Besides Wnt, YAPATAZ may also be regulated by 
sonic hedgehog (SHH) signaling (Fernandez-L et al., 2009). On 
the other hand, YAPATAZ has been shown to regulate the ex- 
pression of ligands for Wnt, Shh, transforming growth factor p 
(TGF-p), JAK-STAT, EGFR, and Notch pathways (reviewed in 
Yu et al., 2015). The extensive signaling crosstalk may create a 
microenvironment rich in different factors, which in turn regulates 
cell fate through autocrine or paracrine mechanisms in both cell- 
autonomous and cell-non-autonomous manners. 

Hippo Pathway in Early Embryonic Development 

YAPATAZ are critical during early embryonic development. 
Although TAZ knockout mice are viable, YAP knockout mice 
die at E8.5, and blastomeres stop dividing before the morula 
(1 6-32 cells) stage when YAP and TAZ are both deleted (Hossain 
et al., 2007; Makita et al., 2008; Morin-Kensicki et al., 2006; Nish- 
ioka et al., 2009; Tian et al., 2007). Therefore, the role of YAPATAZ 
in early development is partially overlapping. 

The first cell fate specification during embryogenesis occurs 
during preimplantation stage, in which the trophectoderm (TE) 
and inner cell mass (ICM) are formed. The TE consists of the 
outer cells of the blastocyst and forms extraembryonic tissues, 
while the ICM contains the inner cells of the blastocyst and 
give rise to the embryo proper and other tissues. The formation 
of the TE and ICM is mainly due to the position or polarity of cells 
in the morula, in which the inner and apolar cells form the ICM, 
while the outer and polar cells give rise to the TE (Sasaki, 
2015). As early as the 16-cell stage, YAPATAZ already show dif- 
ferential subcellular localization between inner and outer cells, 
and this difference lasts to the blastocyst stage. The different 
distribution of YAPATAZ in the TE and ICM results in different 
gene expression signatures, especially the induction of TE-spe- 



cific genes, such as Cdx2 in outer cells, thus directing cell fate 
specification (Nishioka et al., 2009). Indeed, mouse embryos 
with TEAD4 knockout failed to develop TE cells, with all cells 
differentiating into ICM. On the other hand, depleting l_ATS1/2, 
NF2, or AMOT/AMOTL2 turns all cells into TE linage, and these 
embryos failed to develop ICM-derived tissues (Cockburn 
et al., 2013; Hirate et al., 2013; Lorthongpanich et al., 2013). 
These results suggest that the Hippo pathway plays a key role 
in early embryonic cell specification. 

Hippo Signaling in Organ Size Control and Tissue 
Homeostasis 

The effect on organ size is the best-known physiological function 
of the Hippo pathway. In Drosophila, mutation of Hippo pathway 
kinases (hpo and wts) or upstream regulators (ex, mer, kibra, 
ft, etc.) leads to overgrowth of organs such as eyes, wings, or 
other appendages, and transgenic expression of yki results in a 
similar phenotype (reviewed in Haider and Johnson, 201 1 ; Pan, 
2010). The increased tissue/organ size is mainly due to Yki- 
induced cell proliferation and survival. 

The effect of the Hippo pathway on organ size is highly 
conserved in mammals, as revealed by many studies performed 
in mice (summarized in Table 2). For instance, liver-specific 
transgenic Yap expression in mice can produce a dramatically 
enlarged liver. Remarkably, the liver returns to its normal size 
via apoptosis once Yap overexpression is turned off (Camargo 
et al., 2007; Dong et al., 2007). Similarly, liver-specific knockout 
of /V/sf 7/2, Sav1 , or Nf 2 a\so results in liver enlargement (Yin etal., 
2013; Zhang et al., 2010; Zhou et al., 2009). The mouse embry- 
onic heart is enlarged when Sav1 , Mst1l2, or Lats1l2 is deleted, 
and proliferation or apoptosis of cardiomyocytes is sensitive to 
genetic manipulation of Yap (Del Re et al., 2013; Heallen et al., 
201 1 ; Lin et al., 201 4; von Gise et al., 201 2; Xin et al., 201 3; Xin 
et al., 2011). 

However, not all organs are equally sensitive to Hippo pathway 
mutations. For instance, Mst1l2 knockout results in dramatic 
overgrowth of liver, heart, stomach, and spleen, but not of kid- 
ney, lung, or limbs (Song et al., 2010). A possible explanation is 
that there are MST1 /2-independent regulators of YAPAT/\Z in 
these tissues. In the breast and intestine, tissue-specific deletion 
of Yap does not result in any defects in tissue structure or size 
(Cai et al., 2010; Chen et al., 2014). Yap knockout in the mouse 
liver leads to bile duct defects, but not a general reduction of liver 
size. This lack of effect could be due to the presence of TAZ, 
which should be activated due to the loss of feedback inhibition 
upon Yap deletion. Alternatively, the Hippo pathway, or YAPATAZ 
activity, may be negligible for the size control of some organs. 
Nevertheless, the organ-specific effects of the Hippo pathway 
in size control may suggest that different principles are utilized 
for size regulation in different organs. For example, organ size 
could be determined by proliferation of differentiated cells or 
the number of progenitor cells in a pre-allocated pool. 

Physiological signals upstream of the Hippo pathway impor- 
tant in organ size determination have yet to be identified. 
Mechanical force or tension may change as a result of organ 
growth and may inhibit YAPATAZ when the organ reaches its final 
size. Alternatively, organ size may be restricted/induced by a 
soluble factor via autocrine/paracrine mechanisms, and the 
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Table 2. Physiological and Pathological Functions of Hippo Pathway Genes in Mice 


Organ 


Phenotypes 


References 


Liver 


Vap-inducible expression results in hepatomegaly in a reversible 
manner and, in long term, leads to development of liver tumors. 


Camargo et al., 2007; Dong et al., 2007 




Nf2, Savl, or Mst1l2 deletion causes hepatomegaly and results 
in hepatocellular carcinoma, cholangiocarcinoma, or bile duct 
hamartoma. Mob1a/b deletion also causes liver cancer. 


Benhamouche et al., 201 0; Lee et al., 201 0; 

Lu et al., 2010; McClatchey et al., 1998; 

Nishio et al., 2012; Song et al., 2010; Yin et al., 
2013; Zhang et al., 2010; Zhou et al., 2009 




Yap deletion causes bile duct defect, and YAP activity is involved 
in liver regeneration upon tissue damage. 


Bai et al., 2012; Su et al., 2015; Yimlamai et al., 
2014; Zhang etal., 2010 




L/c5 7 -deficiency-induced liver overgrowth is dependent on YAP 
activation. 


Mohseni et al., 2014 


Intestine 


Yap transgenic expression causes intestinal dysplasia. 


Camargo et al., 2007 




Deletion of Savl or Mst1/2 in mouse intestine results in expansion 
of progenitor cells, colonic polyps, or adenoma. 


Cai et al., 2010; Lee et al., 2008; Zhou et al., 
2011 




Deletion of Yap in mouse intestine shows no obvious phenotype 
but affects regeneration upon tissue damage. 


Cai et al., 2010 




Yap transgenic expression (intestine specific) results in rapid loss 
of intestinal crypts by repressing Wnt signaling. 


Barry et al., 2013 




Apc-deletion-induced expansion of intestinal crypts in a YAP/TAZ- 
dependent manner. 


Azzolin et al., 2014; Cai et al., 2015 


Skin 


Deletion of Savl or transgenic expression of Yap leads to 
expansion of basal progenitor cells and skin thickening; 
long-term activation of YAP results in squamous cell carcinoma. 
Moblalb deletion also causes skin cancer. 


Camargo et al., 2007; Lee et al., 2008; 
Nishio et al., 2012 




Deletion of Mst1/2 shows no clear phenotype. Deletion of Ctnnal 
(a-Catenin) leads to keratinocyte hyperproliferation and squamous 
cell carcinoma likely mediated by YAP activation. 


Schlegelmilch et al., 201 1 




Gnas (Gs) deletion causes basal-cell carcinoma partially 
dependent on YAP activation. 


Iglesias-Bartolome et al., 2015 


Heart 


Deletion of Savl, Mst1/2, or Lats2 at embryonic stage or 
transgenic expression of active Yap mutant results in 
hyperproliferation of cardiomyocytes and enlargement of heart. 


Del Re et al., 2013; Heallen et al., 2011; 
Lin et al., 2014; von Gise et al., 2012; 
Xinet al.,2013; Xin et al.,2011 




Deletion of Yap leads to heart hypoplasia, and more severe 
phenotype is observed when both Yap and Taz are deleted. 


von Gise et al., 2012; Xin et al., 201 1 




In adult heart, high YAP/TAZ activity enhances heart 
regeneration following cardiac damages such as myocardial 
infarction. 


Heallen et al., 2013; Lin et al., 2014; 
Xinet al., 2013 


Kidney 


Taz deletion causes polycystic kidney disease, whereas Yap 
deletion leads to hypoplastic kidneys with severe defect in 
nephron morphogenesis. 


Hossain et al., 2007; Makita et al., 2008; 
Reginensi et al., 2013; Tian et al., 2007 




Kidney with Mst1l2 or Savl deletion appears normal. 


Reginensi et al., 2013; Song et al., 2010 


Lung 


Taz-deleted mice display abnormal alveolar structures. 


Makita et al., 2008; Mitani et al., 2009; 
Tian et al., 2007 




Mst1/2 deletion leads to disrupted lung structures and neonatal 
lethality, which is dependent on high YAP/TAZ activity. Mst1/2 
deletion in adult lung bronchiolar epithelial cells results in airway 
hyperplasia and altered differentiation. YAP appears critical in 
regulating proximal-distal patterning of the lung, and a 
decrease in YAP activity ensures epithelial cells differentiation. 


Lange et al., 2015; Lin et al., 2015a; 
Mahoney et al., 2014 




Moblalb deletion causes lung tumor. 


Nishio et al., 2012 




Forkhead box A2 (FOXA2) has also been shown critical in 
mediating the effect of Mst1l2 in lung development. 


Chung et al., 2013 



{Continued on next page) 
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Table 2. Continued 


Organ 


Phenotypes 


References 


Pancreas 


The effect of Hippo pathway is mainly in the exocrine compartment 
of the pancreas. During postnatal stage, deletion of Mst1/2 increases 
the ratio of ductal and acinar cells and leads to pancreatitis-like 
autodigestion and a reduced size of pancreas. 


Gao et al., 2013; George et al., 2012 




Kras (K12D)-mutant-induced pancreatic ductal adenocarcinoma 
requires YAP activity. 


Zhang et al., 2014a 


Nervous System 


Nf2 deletion causes an expansion of the neural progenitor pool 
and results in enlargement of the cortical hem, malformation of 
hippocampus (at late embryogenesis), and thickening of the 
neurocortex. Nf2 deletion also affects development of corpus 
callosum, in which Yap-mediated overexpression of SLIT2 
disrupts callosal axon pathfinding. 


Lavado et al., 2013; Lavado et al., 2014 


Mammary glands 


Yap and Savl are dispensable in mammary glands development. 
During pregnancy, Yap deletion results in hypoplasia and reduced 
alveolar structures; on the other hand, Savl deletion or 
transgenic expression of Yap prevents terminal differentiation 
of mammary cells. 


Chen et al., 2014 




Moblalb deletion causes breast tumor. 


Nishio et al., 2012 




Yap deletion delays mammary tumor growth induced by polyoma 
middle T antigen (PyMT). 


Chen et al., 2014 


Muscle 


Yap overexpression promotes proliferation of satellite cells and 
represses their differentiation. YAP downregulation reduces basal 
skeletal muscle fiber size, and YAP activity is required to relieve 
neurogenic muscle atrophy following injuries. 


Judson et al., 2012; Watt et al., 2015 



concentration of this factor is controlled by organ size. These two 
models are not mutually exclusive, and further investigations are 
needed to address how the Hippo pathway senses physiological 
cues to modulate organ size. 

High YAPATAZ activity has been observed in the stem or pro- 
genitor cells of multiple tissues, suggesting a role for YAPATAZ in 
stem cell maintenance. For example, YAP is highly nuclear in 
basal progenitor cells and in intestinal stem cells localized at 
the crypt base (Barry et al., 201 3; Camargo et al., 2007; Schlegel- 
milch et al., 201 1 ; Zhang et al., 2011). Activation of YAP, either by 
transgenic expression of YAP or deletion of upstream regulators, 
usually results in expansion of progenitor cells, impaired cell dif- 
ferentiation, and hyperplasia of target tissues such as intestine, 
liver, skin, and nervous system (Cai et al., 2010; Camargo 
et al., 2007; Cao et al., 2008; Lee et al., 2008, 2010; Lu et al., 
2010; Zhou et al., 2011). 

The role of YAPATAZ on cell proliferation and stem cell expan- 
sion suggests a critical function of YAP in normal tissue develop- 
ment and homeostasis. Indeed, tissue-specific deletion of Yap 
results in abnormalities of the heart, skin, and kidney (Reginensi 
et al., 201 3; Schlegelmilch et al., 201 1 ; von Gise et al., 201 2; Xin 
et al., 2011). However, mammary glands and the intestine remain 
relatively normal upon Yap deletion (Cai et al., 2010; Chen et al., 
2014; Zhou et al., 2011). These findings suggest that YAP is 
required for development and homeostasis of some, but not 
all, tissues in mice. In human, TEAD1 mutations are found in 
Sveinsson chorioretinal atrophy, a disease characterized by 
chorioretinal degeneration, and Aicardi syndrome, a congenital 
neurodevelopmental disorder (Fossdal et al., 2004; Schrauwen 
et al., 2015). In addition, loss-of-function mutations of YAP 



have been identified in both isolated and syndromic optic fissure 
closure defects (Williamson et al., 2014). Hence, the loss of 
TEAD-mediated YAP transcriptional activity plays a role in 
some degeneration-related disorders in humans. 

Even though it is not required for development and normal ho- 
meostasis of some tissues, YAP activity is critical for tissue 
regeneration upon certain types of damage. For example. Yap 
deletion severely compromises pregnancy-induced mammary 
tissue growth, although virgin mammary development was 
normal (Chen et al., 2014). Likewise, in wild-type mice, the intes- 
tines can effectively regenerate following colitis induced by 
dextran sulfate sodium (DSS) treatment; however, the regenera- 
tive capability is severely impeded in conditional Yap knockout 
mice (Cai et al., 2010). Similar results have also been observed 
in Drosophila midgut regeneration upon DSS-induced injury 
(Karpowicz et al., 201 0; Ren et al., 201 0; Shaw et al., 201 0). Nor- 
mally the liver regenerates efficiently following liver damage. For 
instance, after partial hepatectomy, hepatocytes start to prolifer- 
ate to restore liver mass in a few days, and in this process, YAP 
activity is induced and is most likely required for complete liver 
regeneration (Grijalva et al., 2014; Su et al., 2015; Wu et al., 
201 3; Yimlamai et al., 201 4). In contrast to the intestine and liver, 
tissue regeneration of the adult heart is very limited. However, 
inactivation of the Hippo pathway or transgenic expression of 
Yap restores some myocardial regenerative capability, although 
the efficiency is low. In contrast, cardiac-specific deletion of Yap 
impedes regeneration of the neonatal heart (Heallen et al., 2013; 
Lin et al., 2014; Xin et al., 2013). Taken together, these results 
indicate that YAP plays a significant role in regeneration of mul- 
tiple tissues. 
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Hippo Signaling in Cancer 

Long-term YAP activation, such as transgenic expression of Yap 
in the mouse liver, results in cell transformation and tumor devel- 
opment (Dong et al., 2007), indicating the power of the Hippo 
pathway in cancer initiation and progression. Evidence of the 
Hippo pathway in tumorigenesis based on mouse models is 
summarized in Table 2, which generally supports an oncogenic 
role for YAPATAZ, as well as a tumor-suppressive function for 
Hippo pathway upstream components. 

The tumor-promoting activity of YAP is largely dependent on a 
TEAD-mediated transcription program, as YAP-induced liver 
cancer is fully blocked by expression of a dominant-negative 
TEAD that is able to sequester both YAP and TAZ (Liu-Chitten- 
den et al., 201 2). At the cellular level, YAP activation is important 
for cell proliferation, survival, migration, and invasion. High YAP 
or TAZ activity enables the cell to escape contact inhibition and 
anoikis and to support anchorage-independent growth (Chan 
et al., 2008; Zhao et al., 2007, 2012). YAP induces expression 
of ZEB1/2 to stimulate the epithelial-to-mesenchymal transition 
(EMT), which is a key step for tumor metastasis (Gao et al., 
2014; Lei et al., 2008; Liu et al., 2010b; Overholtzer et al., 
2006). In addition, YAP is able to promote genomic instability 
(Fernandez-L et al., 2012), and TAZ is required to sustain self- 
renewal and the tumor-initiation capacity of breast cancer 
stem cells (Cordenonsi et al., 2011). 

Accumulating evidence suggests that the Hippo pathway is 
dysregulated in many human cancers. Elevated YAPATAZ ex- 
pression or nuclear enrichment of YAPATAZ has been observed 
in many types of cancers, including liver, breast, lung, colon, 
ovary, and others (Chan et al., 2008; Moroishi et al., 201 5a; Stein- 
hardt et al., 2008). However, the majority of cancers with high 
YAPATAZ activity have not been linked to genetic mutations of 
the Hippo pathway, and the overall genetic alteration rate of 
Hippo pathway components in human cancer is relatively low 
(Table 3). 

One well-characterized example of a Hippo pathway mutation 
associated with cancer is in NF2, which causes neurofibroma- 
tosis 2 lesions, including schwannomas and meningiomas (Xiao 
et al., 2003). Moreover, inactivating NF2 mutations are also 
observed in 40%-50% of malignant mesothelioma (Sekido, 
201 1). Importantly, even heterozygous deletion of Yap completely 
blocks liver tumorigenesis induced by Nf2 knockout in mice, indi- 
cating that YAP activation is the major mechanism mediating the 
tumorigenic potential of Nf2 mutations (Zhang et al., 2010). 

YAP gene amplification may contribute to a portion of hepato- 
cellular carcinomas, medulloblastomas, and esophageal squa- 
mous cell carcinomas (Fernandez-L et al., 2009; Overholtzer 
et al., 2006; Song et al., 201 4; Zender et al., 2006). Gene fusions 
involving YAP or TAZ have also been discovered in human can- 
cers. Remarkably, virtually all epithelioid hemangioendothelio- 
mas contain gene fusions of TAZ-CAMTA1 , TAZ-FOSB, or 
YAP-TFE3 (Antonescu et al., 2013; Errani et al., 2011; Flucke 
et al., 2014; Tanas et al., 2011). In addition, YAP gene fusions 
with MAMLD1 or C11orf95 have been discovered in a subset 
of ependymal tumors (Pajtler et al., 2015; Parker et al., 2014). It 
is worth noting that, in both epithelioid hemangioendotheliomas 
and ependymal tumors, all YAPATAZ fusion proteins retain their 
N-terminal TEAD-binding domain but lose the C-terminal trans- 



activation domain. These observations suggest that these YAP 
fusions may still bind to and activate the TEAD-dependent tran- 
scriptional program to promote tumorigenesis. Indeed, neural 
stem cells carrying the YAP-C1 1orf95 fusion gene can effectively 
form brain tumors when grafted into mice (Parker et al., 201 4). In 
addition, a familial YAP point mutation (R331 W) has also been re- 
ported to correlate with a high incidence of lung adenocarci- 
nomas (Chen et al., 2015a). 

Aberrant GPCR signaling often results in tumorigenesis, so it is 
possible that GPCR dysregulation can cause cancer by acti- 
vating YAPATAZ. GNAQ- or GNA11- (encoding Gaq or Ga^, 
respectively) activating mutations have been identified in ~80% 
of uveal melanomas and function as driver mutations (Van 
Raamsdonk et al., 2009; Van Raamsdonk et al., 2010). Recent 
studies showed that YAP is constitutively activated in GNAQ- 
or GA/A7 7-mutated uveal melanomas, and the high YAP activity 
contributes to tumor growth (Feng et al., 2014; Yu et al., 2014). 
In mice, deletion of Gnas (encoding Gas) in skin stem cells initi- 
ates basal-cell carcinogenesis, which is partially dependent on 
YAP (Iglesias-Bartolome et al., 2015). Moreover, expression of 
a viral GPCR induces tumorigenesis in Kaposi’s sarcoma, where 
YAPATAZ also play a critical role (Liu et al., 2015). 

LATS1/2 mutations or gene fusion have been sporadically 
identified in different cancers, which may lead to YAPATAZ 
activation (Table 3). In addition, crosstalk with other cancer- 
related signaling pathways also likely contributes to high 
YAPATAZ activity in cancers that have no mutations of Hippo 
pathway components. For example, KRAS, APC, and LKB1 
mutations have all been reported to activate YAPATAZ (Azzolin 
et al., 2014; Gao et al., 2014; Mohseni et al., 2014; Zhang 
et al., 2014a). 

YAPATAZ activity is also linked to drug resistance and cancer 
relapse. Cultured breast cancer cells with high YAPATAZ activity 
show resistance to drugs such as taxol, 5-fluorouracil, and doxo- 
rubicin (Cordenonsi et al., 2011; Lai et al., 2011; Touil et al., 

2014) . Furthermore, lung and colon cancer cells with high YAP 
activity are resistant to RAF- and MEK-targeted therapies (Lin 
et al., 201 5b). Tamoxifen is commonly used to treat estrogen-re- 
ceptor- (ER, a nuclear receptor) positive breast cancer; however, 
some ER-positive breast cancers are insensitive to tamoxifen. 
Recently, tamoxifen has been shown to activate YAPATAZ by 
stimulating the membrane estrogen receptor GPER, a GPCR. 
Therefore, activation of GPER by tamoxifen or estrogen may 
contribute to tumor growth and drug resistance (Zhou et al., 

2015) . Amplification of the YAP gene has been associated with 
cancer relapse in KRAS-driven colon and pancreatic cancers 
(Kapoor et al., 2014; Shao et al., 2014). Thus, inhibition of YAP/ 
TAZ will not only target tumor initiation and progression, but 
also potentially sensitize cancer cells to chemotherapies and 
prevent cancer relapse. 

Notably, in contrast to its oncogenic function in most solid 
tumors, YAP seems to play a tumor-suppressor role in hemato- 
logical cancers. The YAP gene locus is frequently deleted in he- 
matological cancer, and expression of YAP or inhibition of MST1 
leads to growth inhibition and increased apoptosis (Cottini et al., 
2014). Currently, the underlying mechanism responsible for this 
tumor-suppressor function of YAP in hematological cancers is 
not well understood. 
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Table 3. Genetic Alterations of Hippo Pathway Genes in Human Cancers 


Gene 


Alteration 


Cancer type 


References 


NF2 


mutation or deletion 


mesothelioma 

neurofibromatosis type 2 (schwannoma, 
meningioma) 


Sekido, 2011 
Rouleau et al., 1993 


LATS1/2 


gene fusion 
{LATS1-PSEN1) 


mesothelioma 


Miyanaga et al., 2015 




LATS2 deletion 


mesothelioma 


Murakami et al., 2011 




LATS1/2 mutations 


sporadic in different cancers 


Yu et al., 2013b 


YAP 


amplification 


hepatocellular carcinoma 
medulloblastoma 

esophageal squamous cell carcinoma 


Fernandez-L et al., 2009 
Overholtzer et al., 2006 
Song et al., 2014 
Zender et al., 2006 




mutation (R331W) 


lung adenocarcinoma 


Chen et al., 2015a 




gene fusion 

{YAP-TFE3, YAP-ESR1, YAP- 
C11orf95, and YAP-MAMLD1) 


epithelioid hemangioendothelioma 
luminal breast cancer 
ependymal tumors 


Antonescu et al., 2013 
Flucke et al., 2014 
Lietal.,2013 
Pajtler et al., 2015 
Parker et al., 2014 




deletion 


hematological cancer 


Cottini et al., 2014 


TAZ 


gene fusion 

{TAZ-CAMTA1 and TAZ-FOSB) 


epithelioid hemangioendothelioma 


Errani et al., 2011 
Flucke et al., 2014 
Tanas et al., 2011 


GNAQ/GNA11 


activating mutation 


uveal melanoma 


Van Raamsdonk et al., 2009 
Van Raamsdonk et al., 2010 



Therapeutic Targeting of Hippo Signaling 

The core Hippo pathway is a kinase cascade, and protein ki- 
nases are usually druggable. Thus, inhibitors for MAP4K4, 
MST1/2, or LATS1/2 may be developed to induce YAP/TAZ ac- 
tivity and facilitate the process of wound healing, tissue repair, or 
regeneration and possibly for treating degenerative diseases 
(Figure 4). For example, temporal inhibition of MST1/2 or 
LATS1/2 may promote myocardial regeneration or survival that 
would be beneficial for heart attack patients. It is also possible 
that inhibitors of MST1/2 or l_ATS1/2 could be used for treating 
hematological cancers. 

Generally, MST1/2 and l_ATS1/2 are tumor suppressors, and 
inhibition of MST1/2 or LATS1/2 may promote tumor growth in 
most instances. On the other hand, inhibiting YAPATAZ activity 
would offer a new and attractive anti-cancer strategy (Park 
and Guan, 2013). The function of YAP/TAZ is primarily mediated 
by TEADs, so small molecules disrupting the YAP/TAZ-TEAD 
interaction will function as YAPATAZ inhibitors. Indeed, porphyrin 
family molecules, especially verteporfin, are able to disrupt the 
interaction between YAP/TAZ and TEADs, and verteporfin can 
block transcription of YAPATAZ target genes and suppress liver 
overgrowth induced by YAP overexpression or NF2 inactivation 
in mice (Liu-Chittenden et al., 2012). However, verteporfin has 
general cellular toxicity and low aqueous solubility. Based on 
structural information from the YAP-TEAD and VGLL4-TEAD 
complex, a polypeptide termed “super-TDU” has been de- 
signed to block YAP-TEAD interaction and has been shown to 
suppress tumor growth in mouse models (Jiao et al., 2014). 

It is challenging to design direct activators for protein kinases. 
However, LATS1/2 may be activated indirectly by molecules tar- 



geting their upstream regulators. The very first small molecule 
(dobutamine) identified with an inhibitory effect on YAP is an 
antagonist of a GPCR receptor (Bao et al., 2011). Since then, 
many indirect inhibitors for YAP/TAZ have been identified, 
including phosphodiesterase inhibitors rolipram and ibudilast 
(Yu et al., 2013a). The Rho family GTPases have a strong inhib- 
itory effect on l_ATS1/2, and membrane localization is important 
for Rho cellular function. Indeed, mevalonate metabolic pathway 
inhibitor statins can block membrane translocation of Rho 
GTPases and indirectly inhibit YAP/TAZ activity (Mi et al., 
2015; Sorrentino et al., 2014; Wang et al., 2014). It will be inter- 
esting to test whether these drugs are effective in suppressing 
tumor growth in mouse models and to perform epidemiologic 
studies to determine whether patients using statins or rolipram 
have a lower incidence of cancer. Given the important function 
of the Hippo pathway in regulating cell proliferation and tissue 
homeostasis, it represents an exciting and previously unex- 
plored field for cancer therapy. 

Outstanding Questions 

Despite the rapid research progress in the Hippo pathway, some 
key questions remain unanswered, and new questions are 
emerging. Listed below are some of the key questions in the 
Hippo field. 

(1) What are the molecular mechanisms regulating MST1/2 
activity? Many upstream signals have been convincingly 
shown to regulate LATS1/2 phosphorylation and kinase 
activity. However, neither MST1/2 phosphorylation nor 
its kinase activity is strongly modulated by upstream 
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Figure 4. Therapeutic Targeting of the Hippo Pathway 

(A) Potential roles of YAP/TAZ activity in tissue development and diseases. 
A confined window of YAP/TAZ activity is required for normal tissue devel- 
opment and homeostasis. 

(B) Strategies for targeting YAP/TAZ activity. Inhibitors for MST1/2, MAP4K4, 
and l_ATS1/2 can activate YAP/TAZ. YAP/TAZ-TEAD interaction may be dis- 
rupted by small molecules directly (Verteporfin) or AMPK activators (Metfor- 
min). Small molecules inhibiting Rho-family GTPases or ROCK can indirectly 
activate LATS1/2, leading to YAP/TAZ inhibition. 



signals. Drosophila misshapen (a member of MAP4K in 
mammals) acts upstream of Wts. An interesting question 
is whether MST1/2 and MAP4Ks mediate different or 
similar upstream signals to activate l_ATS1/2. 

(2) Where are Hippo pathway components localized in 
mammalian cells? This may be key to understanding 
how the Hippo pathway is regulated in response to up- 
stream signals. It is obvious that phosphorylated YAP/ 
TAZ are enriched in the cytoplasm and dephosphorylated 
YAP/TAZ in the nucleus. However, it is less clear where 
YAP/TAZ phosphorylation and dephosphorylation occur. 
A related question is what are the mechanisms underlying 
YAP/TAZ translocation between the nucleus and cyto- 
plasm? 

(3) How are l_ATS1/2 regulated by actin remodeling and/or 
cellular tension? This is one of the key questions in under- 
standing the biochemical mechanism of the Hippo kinase 
cascade regulation. Accumulating evidence suggests 
that the actin cytoskeleton and cellular tension play a 
key role in LATS1/2 regulation and appear to act down- 
stream of many, if not most, upstream signals. The actin 
cytoskeleton and cellular tension are intertwined, thus 
the question is: which one plays a more direct role in regu- 
lating Hippo core components? 



(4) What is the mechanism of organ size sensing? Although 
many signals are reported to regulate the Hippo pathway 
in vitro, so far none have been demonstrated to play a key 
role in organ size control in vivo. Uncovering this magic 
signal will solve a key question in developmental biology. 

(5) How does YAP become deregulated in cancer? It is clear 
that YAP/TAZ activation is observed in a broad spectrum 
of human cancers, although mutations in Hippo pathway 
genes are rare. This interesting conundrum indicates that 
the Hippo pathway may be regulated broadly by many 
other cancer-driving pathways. 
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In Brief 

A chemical footprinting method reveals 
that polymers of low-complexity domains 
exhibit similar cross-3 structure in 
hydrogels, liquid-like droplets, and nuclei 
of mammalian cells, suggesting a 
common underlying structural basis. 
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Highlights 

• A footprinting method was used to probe cross-p structure 
of LC domain polymers 

• Similar footprints were obtained from hydrogels, liquid-like 
droplets, and nuclei 

• Mutations impeding hydrogel binding map to the core of the 
LC domain footprint 

• Hydrogel and liquid-like droplet formation is driven by cross- 
3 polymerization 
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SUMMARY 

Many DNA and RNA regulatory proteins contain poly- 
peptide domains that are unstructured when analyzed 
in cell lysates. These domains are typified by an over- 
representation of a limited number of amino acids and 
have been termed prion-like, intrinsically disordered 
or low-complexity (LC) domains. When incubated at 
high concentration, certain of these LC domains poly- 
merize into labile, amyloid-like fibers. Here, we report 
methods allowing the generation of a molecular foot- 
print of the polymeric state of the LC domain of 
hnRNPA2. By deploying this footprinting technique 
to probe the structure of the native hnRNPA2 protein 
present in isolated nuclei, we offer evidence that its 
LC domain exists in a similar conformation as that 
described for recombinant polymers of the protein. 
These observations favor biologic utility to the poly- 
merization of LC domains in the pathway of informa- 
tion transfer from gene to message to protein. 

INTRODUCTION 

DNA and RNA regulatory proteins are composed of two func- 
tional domains. Gene-specific transcription factors contain 
DNA binding domains that recognize specific sequences via 
structurally ordered states, including zinc fingers, homeoboxes, 
helix-loop-helix domains, and leucine zipper domains (Pabo and 
Sauer, 1992). Likewise, RNA binding proteins are able to bind 
RNA via structurally ordered KH domains, RNA recognition mo- 
tifs, and pumilio domains (Lunde et al., 2007). 

Most DNA and RNA regulatory proteins also contain polypep- 
tide domains that lack structural order when purified from cellular 
lysates. The unstructured activation domains of certain tran- 
scription factors contain an over-representation of acidic amino 
acids (Hope et al., 1988). In the context of gene-specific tran- 
scription factors, these structurally disordered domains have 
been termed “acid blobs” or “negative noodles” (Sigler, 1988) 
and other conceptualizations invoking biological function in the 
absence of folded protein structure. 

Not all activation domains associated with gene specific 
transcription factors are acidic. Some are enriched in glutamine 
residues and others in proline residues (Triezenberg, 1995). 

CrossMark 



Common, however, among the majority of activation domains 
is the over-representation of one or a small grouping of 
amino acids. Instead of utilizing a balanced proportion of all 
20 amino acids, these domains are of low complexity in nature. 
Nucleic acids deploy a four-lettered code, proteins a 20 letter 
code. Low-complexity (LC) domains operate via the deploy- 
ment of a highly skewed distribution of amino acids and would 
appear to be much more DNA and RNA like in the nature of 
their code. 

RNA binding proteins also contain LC domains, including re- 
petitive polymers of serine and arginine (SR) in many proteins 
that regulate pre-mRNA splicing (Manley and Tacke, 1996), 
and G/S-Y-G/S repeats in the LC domains associated with the 
FET, CIRBP/RBM3, and hnRNP families of RNA binding proteins 
(Kato et al., 2012). Compared with gene-specific transcription 
factors and their activation domains, less attention has been 
paid to the LC domains associated with RNA regulatory proteins. 
Some degree of attention has been focused on the LC domains 
associated with the FET family of RNA binding proteins, in- 
cluding fused in sarcoma (FUS), Ewing’s sarcoma (EWS), and 
TAF15. The amino terminal LC domains of these three proteins 
can be translocated onto DNA binding domains as the causative 
event in many forms of human cancer (Riggi et al., 2007). In the 
context of these fusion proteins, the LC domains of the FET pro- 
teins are understood to function as potent transcriptional activa- 
tion domains. 

Several years ago, we inadvertently observed polymerization 
of the LC domains of the FET proteins, as well as certain hnRNP 
proteins (Kato et al., 2012). When incubated at high concentra- 
tion, the LC domains of these proteins polymerize into amy- 
loid-like fibers. A combination of X-ray diffraction and electron 
microscopy gave evidence that the fibers were of the prototypic 
cross-p structure first described 50-60 years ago by Astbury 
et al. (1959). Unlike irreversible, pathogenic amyloids, the fibers 
polymerized from LC domains present in FUS, TAF15 and 
hnRNPA2 are readily disassembled upon dilution. By comparing 
the effects of mutations in the LC domains of FUS and TAF1 5 on 
both transcription activation capacity and polymerization, a 
strong correlative relationship gave evidence that polymerization 
might be of critical importance to the function of these domains 
in living cells (Kwon et al., 2013). 

Heretofore missing from this line of research was any evidence 
indicative of the structure of LC domains in their native state. In 
attempts to address this shortcoming, we developed a chemical 
probing strategy that allows generation of a footprint indicative 
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Figure 1. Differing Patterns of Acetylation 
of Folded and Denatured Samples of 
Glutathione-S-Transferase Mediated by 
N-Acetyli midazole 

(A) Folded GST was exposed to NAI under con- 
ditions leading to roughly one modification per 
polypeptide chain, with the reaction quenched by 
the addition of 0.8 M Tris. A separate batch of GST 
grown in bacterial cells supplemented with ^Re- 
labeled tyrosine was denatured in 5 M guanidine 
thiocyanate prior to NAI treatment. Following 
quenching with Tris, the two samples were mixed, 
digested with chymotrypsin, and subjected to 
SI LAC mass spectrometry. 

(B) 19 acetylated side chains were scored for 
abundance in the two samples, yielding an NAI 
footprint. The degree of residue protection from 
NAI modification in the folded state, relative to the 
denatured state, is measured on the y axis as log2 
values. 




(C) Plot showing the correlative relationship be- 
tween the degree of protection from NAI in the 
folded state, relative to the denatured state (x 
axis), and the measured level of solvent accessi- 
bility determined from the X-ray crystal structure of 
GST (y axis). 

See also Figure SI and Table SI and S2. 



of ordered structure. After having validated the utility of the 
approach using two enzymes of known structure, we deployed 
the footprinting strategy on fibrous polymers of the LC domain 
of hnRNPA2. Our observations give evidence that the LC domain 
of hnRNPA2 exists in the same structural state in both recombi- 
nant polymers of the protein and native hnRNPA2 within the nu- 
clear compartment of mammalian cells. 



RESULTS 

Development of a Chemical Footprinting Method 

N-acetylimidazole (NAI) is a reactive chemical that is capable of 
acetylating certain amino acid side chains in proteins (Riordan 
et al., 1965; Timasheff and Gorbunoff, 1967). Under conditions 
of neutral pH, the chemical can donate an acetyl group to serine, 
tyrosine, lysine, threonine, arginine, and asparagine side chains. 
Reasoning that the ability of NAI to modify amino acids might be 
influenced by the structural state of a protein, we compared 
modification of glutathione-S-transferase (GST) as a function of 
its folded versus unfolded state. GST enzyme was prepared un- 
der conditions of isotopic labeling with ^^C-labeled tyrosine to 
produce a “heavy” protein sample. This sample was denatured 
with a chaotropic reagent and exposed to NAI under conditions 
leading to roughly one modification per polypeptide. The reac- 
tion was quenched with 0.8 M Tris (pH 8.8), which inactivates 



the NAI reagent. A corresponding “light” 
sample of GST was exposed to NAI in 
its folded state, again under conditions 
resulting in roughly one modification per 
polypeptide and again quenched with 
Tris to terminate the reaction. The heavy 
and light samples were mixed at a 1:1 ratio, digested with 
chymotrypsin and then evaluated by SILAC mass spectrometry 
(Figure lAand Experimental Procedures). 

The patterns of NAI reactivity with the denatured and folded 
states of GST were different. Certain amino acid side chains re- 
acted similarly in the two protein samples (Y23, K27, K40, K64, 
Y111, and R182), whereas others were acetylated to a lesser 
extent in the folded sample compared with denatured GST (Y7, 
Y57, Y58, and Y192). The NAI “footprint” of GST is shown in Fig- 
ure 1 B. We then compared this footprint with the degree of surface 
exposure of NAI-modified side chains as deduced from the X-ray 
crystal structure of the enzyme (Rufer et al., 2005) (PDB: 1 Y6E). A 
strong correlative relationship was observed between NAI acces- 
sibility and solvent exposure in the structure (Figure 1C and Table 
SI). Surface-exposed residues tended to be NAI accessible, 
whereas residues buried within the core of the enzyme tended to 
be NAI inaccessible. We conclude that the correlative match be- 
tween NAI-accessibility and protein structure gives evidence 
that the NAI footprint is properly reflective of protein structure. 

Proceeding from a recombinant protein sample to a native 
protein within mammalian nuclei, we evaluated the difference 
in NAI modification of the poly-ADP-ribose polymerase (PARP) 
enzyme as a function of its folded versus denatured state. Nuclei 
were prepared from 293T cells that had been grown in either 
normal tissue culture medium (light) or medium deprived of tyro- 
sine and supplemented with an isotopically labeled form of the 



830 Cell 163 , 829-839, November 5, 2015 ©2015 Elsevier Inc. 




Cell 



amino acid (heavy). The light sample of nuclei was exposed to a 
30 mM level of NAI for 15 min before quenching with Tris. The 
heavy sample was denatured in 5 M guanidine thiocyanate prior 
to exposure to the same level of NAI and then also quenched 
with Tris. The samples were combined, digested with chymo- 
trypsin overnight, and processed by mass spectrometry (Exper- 
imental Procedures). 

NAI modification was monitored on 14 amino acid side chains 
in the native and denatured forms of PARP (Figure SI A). Six res- 
idues were modified by NAI far more extensively in the denatured 
sample than the intact enzyme (K621 , T799, K802, Y817, S902, 
and S904), five residues were modified slightly more extensively 
in the denatured sample relative to the intact enzyme (K571, 
S782, S783, S808, and K816), and three residues were modified 
equally in the two samples (K616, K903, and Y907). We again 
observed a correlation between NAI accessibility and protein 
structure (PDB: 3GJW) (Figure SIB and Table S2). The three 
side chains that were modified equally in the two samples 
show a high level of predicted solvent accessibility in the X-ray 
crystal structure of PARP (Miyashiro et al., 2009). Likewise, five 
of the six residues observed to be highly protected from NAI 
modification are predicted to be solvent inaccessible by the 
crystal structure of the enzyme. 

Analysis of three consecutive residues in the polypeptide 
chain of PARP is particularly revealing. Serine residue 902 is 
protected from NAI modification in nuclear PARP and buried 
beneath the surface of the enzyme. Lysine residue 903 is surface 
exposed and NAI accessible in the folded form of PARP. Finally, 
serine residue 904 is NAI inaccessible in the folded enzyme and 
buried beneath the surface of the PARP crystal structure. We 
offer that the correlative relationship between NAI accessibility 
and the predicted level of surface exposure of a given amino 
acid side chain validates this means of probing protein structure 
both in a recombinant protein and a native enzyme present in 
nuclei of mammalian cells. 

Determination of the NAI Footprint of Recombinant 
hnRNPA2 Fibers 

Hydrogel droplets were formed using a fusion protein linking 
mCherry to the LC domain of hnRNPA2 (Kato et al., 2012). This 
protein sample was exposed to NAI under conditions resulting 
in roughly one modification per polypeptide chain and then 
quenched with Tris (Experimental Procedures). Similarly pre- 
pared hnRNPA2 polymers were formed using protein isotopically 
labeled with heavy tyrosine. The latter sample was denatured 
with guanidine thiocyanate prior to NAI-mediated modification, 
followed by quenching with Tris. The two samples were com- 
bined at a 1:1 ratio, digested with chymotrypsin, and then 
analyzed by mass spectrometry (Experimental Procedures). 

23 amino acid side chains were evaluated for NAI accessibility. 
12 amino acids appeared to be equally accessible to NAI-medi- 
ated modification in the 2 samples, and 1 1 appeared to be less 
accessible in the native fibers relative to the denatured protein 
sample (Figure 2). Three of these acetylated amino acid residues 
could be identified in the same peptide spanning amino acids 
302-319 of the hnRNPA2 polypeptide. High-performance liquid 
chromatography (HPLC) chromatography was successful in 
separating variants of this peptide acetylated at lysine 305, 



serine 306, or serine 312 (Figure 2B). The peptide variant acety- 
lated at K305 was found at equal abundance in both light and 
heavy samples, indicative of the ability of NAI to modify this res- 
idue irrespective of whether the protein was in the fibrous or de- 
natured state. The variant acetylated at S306 was considerably 
less abundant in the light sample than the heavy sample, giving 
evidence of its protection from NAI modification in the fibrous 
state. Finally, the variant acetylated at S312 was slightly less 
abundant in the light sample relative to the heavy sample, which 
is consistent with partial protection from NAI modification when 
the LC domain of hnRNPA2 existed in the polymeric state. 

The pattern of protection from NAI modification in polymeric 
fibers of the hnRNPA2 LC domain, or lack thereof, can be 
described in the following way. An extensive, N-terminal region 
of the protein was equally acetylated by the chemical probe irre- 
spective of the fibrous or denatured state. An equally extensive 
segment corresponding to a more C-terminal region of the LC 
domain was protected in the polymeric state, relative to the de- 
natured state, at 11 out of 12 acetylated residues. Right within 
the middle of this apparently ordered region of the LC domain, 
lysine residue 305 was found to be equally accessible in both 
the polymeric and denatured states of the protein. Finally, the 
three most C-terminal residues scored in the assay were all 
equally accessible under both fibrous and denatured states. 

Relationship of hnRNPA2 Footprints between 
Recombinant and Nuclear Forms of the Protein 

Using the same methods described for determining an NAI foot- 
print for the nuclear form of the PARP enzyme (Figure SI), we 
probed the structure of native hnRNPA2 present in nuclei freshly 
prepared from 293T cells (Experimental Procedures). Isotopi- 
cally labeled heavy protein was probed under the denaturing 
conditions of 5 M guanidine thiocyanate. Light protein was 
probed via the exposure of nuclei to the NAI chemical reagent. 
Following quenching with Tris, the samples were mixed, di- 
gested with chymotrypsin, and evaluated by mass spectrometry. 

The NAI footprint observed for native, nuclear hnRNPA2 could 
be scored for 1 8 of the 23 acetylated residues observed in the foot- 
print derived from recombinant hnRNPA2, and the two footprints 
were qualitatively similar (Figure 3A). Of the acetylation events de- 
tected in both footprints, all nine residues that were equally acces- 
sible to NAI-mediated acetylation in both polymeric and denatured 
samples of recombinant hnRNPA2 protein were also acetylated 
equally in the native hnRNPA2 irrespective of structural state. 
Seven of the eight residues that were preferentially protected 
from acetylation as a function of the fibrous state of recombinant 
hnRNPA2 protein were also preferentially protected in the native 
hnRNPA2 protein relative to nuclear protein that had been dena- 
tured with 5 M guanidine thiocyanate. The single qualitative differ- 
ence between the two footprints was tyrosine residue 324. This 
residue was preferentially protected from NAI-mediated acetyla- 
tion in the fibrous form of recombinant hnRNPA2 yet was equally 
accessible to the chemical probe in native hnRNPA2 irrespective 
of whether nuclei were left intact or denatured. 

Despite displaying qualitative similarities, the NAI-generated 
footprints for recombinant and native hnRNPA2 differed quanti- 
tatively in a consistent manner. The NAI protected residues 
observed in recombinant hnRNPA2 yielded an average of roughly 
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Figure 2. Footprint of NAI-Mediated Acety- 
lation of Recombinant hnRNPA2 Polymeric 
Fibers 

(A) Electron micrographs of negatively stained 
polymeric fibers formed from an mCherry:hnRNPA2 
fusion protein (Experimental Procedures). Scale bar, 
70 nm. 

(B) HPLC separation of chymotryptic digestion 
products of the LC domain of hnRNPA2 corre- 
sponding to residues 302-319. The S312 acety- 
lated peptide eluted earlier from the column than 
the S306 acetylated peptide, which, in turn, eluted 
earlier than the K305 acetylated peptide (Experi- 
mental Procedures). 

(C) Relative abundances of the K305 acetylated 
peptides in folded versus denatured samples. 

(D) Relative abundances of the S306 acetylated 
peptides in folded versus denatured samples. 

(E) Relative abundances of the S312 acetylated 
peptides in folded versus denatured samples. 

(F) NAI footprint of the LC domain of hnRNPA2 (all 
data are presented as means ± SD). 

See also Figure S2 and Table S3. 
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3-fold (log2 ~1.8) difference when comparing peptide abun- 
dance in the light (fibrous) and heavy (denatured) samples. 
Turning to the native hnRNPA2 assayed in either intact or dena- 
tured nuclei, the average difference in peptides revealing NAI pro- 
tected residues was roughly 1.5-fold (log2 ~0.5). Interpreted 
most simply, this difference gives indication that a smaller frac- 
tion of the native hnRNPA2 present in nuclei may exist in the 
structurally ordered state than the fraction deduced by studies 
of recombinant hnRNPA2 polymeric fibers. 

Co-expression of hnRNPA2 with Peptidyl-prolyl Cis- 
trans Isomerase Causes Tyrosine 324 to Become NAI 
Accessible in Recombinant Polymers 

The NAI footprint observed in recombinant hnRNPA2 polymeric fi- 
bers was qualitatively similar to that observed for native hnRNPA2 
in intact nuclei. Among 1 8 residues defining the footprint, tyrosine 
324 was the single amino acid that was clearly different in the two 
samples. This residue was protected from NAI-mediated acetyla- 
tion in fibrous preparations of recombinant hnRNPA2, but not in 
the native hnRNPA2 present in intact nuclei. 



Proline residues are found six positions 
on the amino terminal side of tyrosine 
324, and two positions on its carboxyl 
terminal side (Figure S2). Proteomic 
studies of cellular proteins that bind to 
hydrogel droplets formed from the LC 
domains of both hnRNPA2 and FUS 
revealed retention of peptidyl-prolyl cis- 
trans isomerase 1 (PPIA), the most abun- 
dant isoform of a family of peptidyl-prolyl 
cis-trans isomerase enzymes. PPIA has 
been reported to interact with RNA 
granule proteins upon biochemical frac- 
tionation (Lauranzano et al., 2015), and 
antibodies to the enzyme revealed co- 
localization with stress granules (Figure S3). We thus reasoned 
that the PPIA enzyme might affect the structure of hnRNPA2 fi- 
bers by facilitating cis-trans interconversion of the peptide bonds 
of proline residue 319 or 326 of the hnRNPA2 polypeptide. 

To test this hypothesis, mCherry:hnRNPA2 was co-expressed 
with either the native form of PPIA or a catalytically inactive 
mutant (Zydowsky et al., 1992). Following purification of the 
mCherry:hnRNPA2 protein, polymeric fibers were formed and 
exposed to the NAI probe under either the polymeric or dena- 
tured state. Co-expression of hnRNPA2 with the active form of 
PPIA yielded an NAI footprint wherein tyrosine residue 324 was 
equally accessible to acetylation irrespective of fibrous or dena- 
tured state (Figure 3B, top). By contrast, co-expression with the 
catalytically inactive form of PPIA yielded a footprint indistin- 
guishable from that seen on recombinant hnRNPA2 never 
exposed to the enzyme (Figure 3B, bottom). 

The bottom panel of Figure 3 (Figure 3C) correlatively com- 
pares the NAI footprints of hnRNPA2 observed in native protein 
within intact nuclei with that of recombinant protein expressed in 
either the absence or presence of PPIA. The r-value of correlation 
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Figure 3. NAI Footprints of the LC Domain of 
hnRNPA2 Deduced from Recombinant Pro- 
tein, Native Nuclear hnRNPA2, and Recom- 
binant Protein Co-expressed with Peptidyl- 
prolyl Cis-trans Isomerase 

(A) NAI footprint of recombinant hnRNPA2 fibers as 
described in Figure 2 (upper footprint) compared 
with NAI footprint deduced from native, nuclear 
hnRNPA2 (lower footprint). Note that tyrosine 
324 is protected from NAI modification in the 
folded, recombinant form of hnRNPA2, but not in 
the footprint deduced from the native, nuclear 
protein. 

(B) NAI footprint of recombinant hnRNPA2 co-ex- 
pressed with active PPIA enzyme (upper footprint) 
compared with footprint of hnRNPA2 co-ex- 
pressed with a catalytically inactive form of the 
enzyme (lower footprint). Note that co-expression 
of hnRNPA2 with the active form of PPIA causes 
tyrosine 324 to become exposed to NAI modifi- 
cation in the polymeric state. 

(C) Plots showing the correlative relationship of 
the NAI footprint of recombinant hnRNPA2 to 
that of the native, nuclear form of the protein. 
Correlation plot on left compares the footprint of 
recombinant hnRNPA2 not exposed to the PPAI 
enzyme with the nuclear hnRNPA2 footprint. 
Correlation plot on right compares the footprint of 
recombinant hnRNPA2 co-expressed with the 
active PPIA enzyme with the nuclear hnRNPA2 
footprint. 

See also Figure S3. 



of the native and recombinant footprints was 0.76, which in- 
creased to 0.89 when the recombinant hnRNPA2 had been co-ex- 
pressed with PPIA. 

Mutations in the NAI-Protected Region of the hnRNPA2 
LC Domain Impede Hydrogel Binding 

Is the NAI footprint telling us anything of functional relevance to 
the LC domain of hnRNPA2? To address this question, we pre- 
pared mutated variants of the LC domain of hnRNPA2 wherein 
all 25 phenylalanine and tyrosine residues were individually 
mutated to serine (Figure S2). GFP fusion proteins representing 
wild-type hnRNPA2 and all of the individual mutants were ex- 
pressed in bacterial cells, purified, and assayed for the ability 
to adhere to mCherry:hnRNPA2 hydrogel droplets (Figure 4A). 

Of the 25 mutants, 6 were found to substantially impede bind- 
ing to hydrogel droplets formed from mCherry fused to the 
wild-type LC domain of hnRNPA2. Five of the six tyrosine- or 
phenylalanine-to-serine mutations that substantially impede hy- 
drogel binding occur within the region of the LC domain that is 
protected from NAI modification in the fibrous state (Y278S, 
Y283S, F291S, F309S, and Y319S). The sixth mutant that signif- 
icantly impeded in hydrogel binding, Y264S, occurs on the amino 
terminal side of the NAI protected region within a span where we 
failed to find acetylated side chains— a dead zone in the footprint 



(residues 258-282, Figure 2F). We tentatively conclude that 
these six residues are particularly important for polymerization 
of hnRNPA2 and that polymerization causes NAI protection. 

The remaining 19 mutants fell into two categories with respect 
to hydrogel binding. 12 mutants bound to hydrogels in a manner 
indistinguishable from wild-type hnRNPA2. Two of these mu- 
tants, Y335S and Y341 S, were located in the very C-terminal re- 
gion of the LC domain, concordant with a small region that was 
fully accessible to NAI modification irrespective of whether the 
protein was in a polymeric or denatured state. Seven of these 
phenotype-void mutants, F95S, F197S, F207S, F215S, Y222S, 
F228S, and Y250S, were located in the amino terminal region 
of the LC domain that was widely accessible to NAI modification 
irrespective of structural state. The remaining three mutations 
that had no discernible effect on hydrogel binding, F244S, 
Y257S, Y275S, were all localized in the dead zone of the NAI 
footprint. Finally, seven mutants, including Y235S, Y250S, 
Y271S, Y288S, Y274S, Y301S, and Y324S, mildly affected bind- 
ing to mCherry:hnRNPA2 hydrogels. These seven mutants map- 
ped randomly across the LC domain of hnRNPA2. We conclude 
that tyrosine- and phenylalanine-to-serine mutations in NAI pro- 
tected regions impede hydrogel binding, whereas those in NAI 
accessible regions do not impede hydrogel binding. This conclu- 
sion favors functional significance of the NAI footprint. 
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Figure 4. Correlative Relationship between 
Binding of Mutated Variants of the LC 
Domain of hnRNPA2 to Hydrogels Relative 
to Their Partitioning into Liquid-like Droplets 

(A) All phenylalanine and tyrosine residues within 
the LC domain of hnRNPA2 were individually 
mutated to serine, expressed as GFP fusion pro- 
teins, purified and tested for binding to mCher- 
ry:hnRNPA2 hydrogel droplets (Experimental Pro- 
cedures). Top figures show images of hydrogel 
binding by GFP linked to the native LC domain of 
hnRNPA2 (WT), the F215S mutant, the Y271S 
mutant, and the F291S mutant. Confocal images 
were scanned to yield the signal intensity of bound 
GFP (Experimental Procedures), yielding the 26 
scans in the lower part of the figure, x axis indicates 
the scanned distance in lam, and the y axis in- 
dicates the GFP signal intensity in arbitrary units. 

(B) Liquid-like droplets formed upon binding of a 
PTB:hnRNPA2 fusion protein to a synthetic RNA 
containing five copies of the PTB recognition 
sequence (Experimental Procedures; see also 
Figure S4). The presence of a SNAP tag allowed the 
PTB:hnRNPA2 fusion protein to be appended with 
a red dye. When exposed to GFP alone, no parti- 
tioning into liquid-like droplets was observed (data 
not shown). When exposed to GFP fused to the 
native LC domain of hnRNPA2 (WT), clear evi- 
dence of partitioning was observed within minutes. 
Certain phenylalanine- or tyrosine-to-serine mu- 
tants partitioned well into liquid-like droplets 
(F21 5S), whereas others did not (Y271 and F291 S). 

(C) Plot showing the correlative relationship be- 
tween hydrogel binding and partitioning into liquid- 
like droplets for GFP linked to the native (WT) 
LC domain of hnRNPA2 along with 25 individual 
phenylalanine- and tyrosine-to-serine mutants. 

(D) Partitioning into liquid-like droplets was quan- 
tified for all phenylalanine- and tyrosine-to-serine 
mutants that had been constructed and assayed 
for binding to mCherry:hnRNPA2 hydrogel droplets 
(A). Histogram shows relative levels of partitioning 
of GFP linked to the native (WT) LC domain 
of hnRNPA2 as compared with the 25 individual 
mutants. 

See also Supplemental Information and Figure S2. 



Mutations in the LC Domain of hnRNPA2 Act 
Correlatively on Hydrogel Binding and Partitioning into 
Liquid-like Droplets 

During the preparation of GFP:FUS hydrogel droplets, we have 
long observed that the concentrated protein solutions become 
cloudy prior to gelation. Reanalysis of a Flise-tagged LC domain 
of FUS by light microscopy revealed the cloudy solution to be 
composed of liquid-like droplets (Figure S4). A number of inves- 
tigators have recently reported that LC domains from a variety of 
proteins, including FUS, hnRNPAI , and DDX4, can prompt for- 
mation of liquid-like droplets (Altmeyer et al., 2015; Lin et al., 
2015; Molliex et al., 2015; Nott et al., 2015; Patel et al., 2015). 

It is of potential importance to know whether the physical 
forces leading to hydrogel formation (polymerization of LC do- 
mains) are the same or different from those leading to liquid- 
like droplets. To this end, we have followed the procedures of 
Lin et al., (2015) to create liquid-like droplets driven by the LC 



domain of hnRNPA2. A triple fusion protein was prepared linking 
the LC domain of hnRNPA2 on the C-terminal side of a poly- 
pyrimidine tract-binding protein (PTB) RNA binding domain, 
which was in turn linked to maltose binding protein (MBP), with 
a tobacco etch virus (TEV) protease cleavage site between the 
MBP and PTB domains (Experimental Procedures and Fig- 
ure S4B). The MBP:PTB:hnRNPA2 LC domain fusion further 
contained a Hise tag at its C terminus, as well as a SNAP tag 
for dye labeling on the N-terminal side of the PTB domain. 

Following co-expression with PPIA, purification via nickel and 
amylose resin chromatography, the protein was mixed with a 
synthetic RNA containing five copies of a PTB binding site 
and exposed to TEV protease. Within 10 min, liquid-like droplets 
could be observed by light microscopy (Figure 4B). We then 
deployed a droplet partitioning assay to assess whether 
GFP:hnRNPA2 LC domain fusion proteins could be incorporated 
into the liquid-like droplets. Recombinant GFP-alone protein 
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Figure 5. Liquid-like Droplets Display the 
Same NAI Footprint as Found in Hydrogel 
Polymers and the Native hnRNPA2 Present 
in Nuclei Freshly Isolated from Mammalian 
Cells 

(A) A fusion protein iinking maitose binding pro- 
tein (MBP) to the RNA binding domains of PTB 
and the LC domain of hnRNPA2 (Figure S4B) was 
co-expressed with the peptidyi-proiyi cis-trans 
isomerase enzyme (PPIA), purified, and mixed 
with a synthetic RNA containing five PTB binding 
sites. Addition of TEV protease triggered the 
rapid formation of iiquid-iike dropiets (Figure 4B). 
Protein sampies were footprinted with the NAi 
reagent as a function of time before and after TEV 
protease cieavage. Hints of the NAI footprint 
could be seen in the protein sample before 
exposure to TEV protease, and the intensity of 
the footprint was sequentially enhanced at 
the 10 min, 2 hr, and 18 hr post-cleavage time 
points. 

(B) The log2 ratio of NAI protection for all of the 1 8 
acetylated amino acids is plotted on the y axis as 
a function of time post-exposure to TEV protease 
(x axis). 
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was not enriched in these liquid-like droplets relative to the sur- 
rounding buffer. The GFP fusion linked to the wild-type LC 
domain of hnRNPA2 was rapidly incorporated into liquid-like 
droplets. Using this assay, we evaluated all 25 mutants that 
had been scored for hydrogel binding (Figures 4B and 4D). 

Six mutants were impeded by more than 50% with respect to 
partitioning into liquid like droplets (Y257S, Y264S, Y278S, 
F291S, F309S, and Y319S), another eight mutants were partially 
impeded (F195S, F207S, Y235S, Y250S, Y283S, Y288S Y294S, 
and Y301S), and the remaining mutants were incorporated 
into liquid-like droplets in a manner indistinguishable from the 
wild-type LC domain (Figure 4D). The correlation plot shown in 
Figure 4C gives evidence of a strong concordance (r = 0.83) be- 
tween the effects of mutations on hydrogel binding and partition- 
ing into liquid-like droplets. We offer that this concordance gives 
evidence that similar regions of the protein promote both hydro- 
gel binding and partitioning into liquid-like droplets and that the 
chemical interactions that drive both processes are likely to be 
the same. 



See also Table S3. 
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Liquid-like Droplets Display the NAI 
Footprint Found in Hydrogel 
Polymers and Nuclear hnRNPA2 

If the mutational effects driving hydrogel 
binding and liquid-like droplets correlate, 
it is possible that the LC domain of 
hnRNPA2 might adopt similar structures 
in both states. To address this question, 
we performed NAI footprinting on the LC 
domain of hnRNPA2 in the context of 
the MBP:PTB:hnRNP LC domain fusion 



protein before TEV cleavage, immedi- 
ately upon seeing the formation of 
liquid-like droplets, 2 hr after droplet formation, and 18 hr after 
droplet formation. 

As shown in Figure 5, evidence of the canonical NAI footprint 
on the hnRNPA2 LC domain could be detected even before TEV 



protease cleavage. The quantitative intensity of the footprint 
became sequentially enhanced at each of the later time points. 
Specifically, the degree of difference in NAI protected residues 
between native and denatured samples was— across all pro- 
tected residues— most pronounced in the 18 hr sample, less 
so in the 2 hr sample, further reduced in the 10 min sample, 
and least pronounced in the sample assayed prior to TEV prote- 
ase cleavage. For the 18 hr time point, the degree of protection 
from NAI-mediated acetylation of buried side chains was indis- 
tinguishable between liquid-like droplets and hydrogels. 

Concordant with the observations of others who have studied 



LC domain partitioning to liquid-like droplets (Lin et al., 2015; 
Molliex et al., 2015; Patel et al., 2015), we conclude that, as a 
function of time, LC polymerization is progressively enhanced 
within liquid-like droplets. Petri et al. (2012) have reported similar 
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observations as a function of maturation of liquid-like droplets 
formed from FG repeats associated with nucleoporin proteins. 
In summary, mutational studies of the LC domain of hnRNPA2 
give evidence that similar forces drive both hydrogel retention 
and partitioning into liquid-like droplets, and NAI footprinting 
studies reveal evidence that the LC domain of hnRNPA2 adopts 
a similar structure in both settings. 

DISCUSSION 

Cells display a variety of organized puncta that, unlike mito- 
chondria, lysosomes, chloroplasts, and peroxisomes, are not 
membrane invested. These include various nuclear structures, 
including nucleoli, nuclear speckles and para-speckles, promye- 
locytic leukemia (PML) bodies, Cajal bodies and histone locus 
bodies (Mao et al., 2011). Cytoplasmic puncta include RNA gran- 
ules, P-bodies, neuronal granules, stress granules, and the polar 
granules of fly and worm embryos that assist in determination of 
the germlineage (Anderson and Kedersha, 2009). Light micro- 
scopic studies of RNA granules have led to the idea that the 
granule components exist in a liquid-like state separated in 
phase from the cytoplasm (Brangwynne et al., 2009). 

Studies that may be pertinent to the biochemical forces lead- 
ing to the organization of these cellular structures have begun 
to appear over the past several years. A potentially common 
conceptualization may tie two orthogonal approaches together. 
Li et al. (2012) have provided evidence that multivalent, poly- 
meric structures form when proteins containing repeated SRC 
homology 3 (SH3) domains are mixed with proteins containing 
repeated proline-rich motifs (PRMs). Upon heterotypic polymer- 
ization into dendritic assemblies, these proteins undergo phase 
separation into spherical, liquid-like droplets. 

Parallel and contemporary to the Li et al. (2012) study, we 
have been studying the LC sequences associated with a vari- 
ety of RNA binding proteins (Han et al., 2012; Kato et al., 
2012). In our case, concentrated samples of these proteins 
have been observed to adopt a gel-like state. Reasonably 
clear evidence has been gathered to support the conclusion 
that hydrogel formation equates to polymerization of the LC 
sequences. Studies of hydrogels have revealed X-ray diffrac- 
tion patterns consistent with cross-p structure, and electron 
microscopic evaluation of hydrogels has revealed homoge- 
neous polymeric fibrils. 

Of significant concern to us has been the question as to 
whether the polymeric structures being studied in test tube reac- 
tions are of biological relevance. Heretofore, any linkage to bio- 
logical utility has been limited to correlative mutagenesis. One 
example of this indirect approach to biological significance has 
been studies of the LC domains of the FET proteins, FUS, 
EWS, and TAF15. All three of these paralogous proteins have 
amino terminal LC domains that can be translocated onto DNA 
binding domains as the causative event leading to human can- 
cer. When fused to the DNA binding domain of GAL4, the LC 
domains of FET proteins function as transcriptional activation 
domains (Riggi et al., 2007). Unbiased mutagenesis of the LC 
domains of TAF15 and FUS have yielded scores of mutants 
that affect polymerization to varying degrees. When tested for 
their capacity to activate transcription in living cells, a strong 



correlative relationship was observed with polymerization ca- 
pacity (Kwon et al., 2013). Mutants fully capable of polymeriza- 
tion activate gene expression potently, mutants mildly impeded 
in polymerization activate transcription to an intermediate de- 
gree, and mutants that are incapable of polymerization fail to 
activate transcription. 

Here, we add a more direct approach to inquire whether LC 
domains might function in cells via the same chemistries and 
structures leading to LC domain polymerization in test tubes. A 
footprinting method was developed using NAI. This chemical 
acetylates amino acid side chains in a manner influenced by pro- 
tein structure and can be deployed as a reagent useful both for 
test tube biochemistry and the probing of native protein within 
freshly isolated nuclei (Figures 1 and SI). Using this approach, 
we hereby demonstrate that the footprint of the LC domain of 
hnRNPA2 in recombinant polymers is highly related to the foot- 
print observed in nuclei (Figure 3). These observations are 
consistent with the conclusion that the LC domain of at least 
some proportion of hnRNPA2 in nuclei adopts a similar cross-p 
structure as has been characterized with recombinant polymers. 

In considering the virtues and properties of liquid-like droplets 
as compared with hydrogels, we offer two contrasting perspec- 
tives. It is possible that the physical forces leading to the two 
states are entirely different. Recent studies of the DDX4 protein 
and LC domains associated with nucleoporin proteins charac- 
terized by FG repeats favor the utility of chemical interactions de- 
ployed to intertwine otherwise unstructured, random coil LC do- 
mains (Nott et al., 2015; Petri et al., 2012). In the case of DDX4, 
TT-stacking between arginine and phenylalanine residues has 
been highlighted as a key chemical determinant for phase sepa- 
ration into liquid-like droplets. These interpretations are distinct 
from the polymerization of LC domains into cross-p structure 
that we consider to be the driving force for hydrogel formation. 

Here, we offer the alternative perspective that cross-p poly- 
merization may be at the heart of formation of both hydrogels 
and liquid-like droplets. By constructing and studying 25 
mutated variants of the LC domain of hnRNPA2, we have found 
mutants that affect hydrogel binding significantly, mildly, or not 
at all (Figure 4A). The former category of mutants mapped almost 
exclusively to the region of the hnRNPA2 LC domain that 
was NAI resistant in the polymeric state (Figure 2). When tested 
for partitioning into liquid-like droplets, a strong correlative rela- 
tionship was observed with hydrogel binding (Figure 4C). Mu- 
tants strongly impeded in hydrogel retention partitioned poorly 
into liquid-like droplets, mutants partially impeded in hydrogel 
retention were mildly impeded from entering liquid-like droplets, 
and mutants that bound hydrogels as well as the wild-type 
LC domain of hnRNPA2 partitioned effectively into liquid-like 
droplets. 

We likewise deployed the NAI footprinting technique to liquid- 
like droplets and observed the same footprint as was found in 
hydrogels composed of the hnRNPA2 LC domain and nuclei 
freshly isolated from mammalian cells (Figure 5). Although these 
observations do not rule out the involvement of other chemical or 
physical forces in the formation of liquid-like droplets, we offer 
the conclusion that cross-p interactions between LC domains 
are an important component of the forces facilitating phase sep- 
aration of LC sequences into liquid-like droplets. 
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Figure 6. Graphical Representation of Con- 
version of Soluble MBP:PTB:hnRNP LC 
Fusion Protein into Liquid-like Droplet State 

The triple fusion linking maltose binding protein 
(MBP, blue circle), the RNA binding domain of 
pyrimidine track binding protein (PTB = green 
rectangle), and the low complexity domain of 
hnRNPA2 (LC domain = wavy line) remains soluble 
and partially polymerized via the LC domain (red 
sheets) prior to TEV cleavage and exposure to 
synthetic RNA containing five PTB binding sites 
(yellow rectangle). Following TEV cleavage and 
exposure to RNA, MBP is left in solution and 
PTB:hnRNP LC domain fusion protein partitions 
into liquid-like droplet (gray shading) in a state of 
enhanced polymerization. 
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If we adopt the simplistic interpretation that studies of hydro- 
gels and liquid-like droplets are variations on essentially the 
same theme, one can consider the differences and utilities of 
the two systems. We reason that the sizes of polymers in hydro- 
gels are much longer than those found in liquid-like droplets and 
that the size distribution and dynamics of LC domains in the latter 
setting may be a better representation of how LC domains func- 
tion in living cells. The quantitative intensity of the NAI footprint 
in the various settings deployed in this study may be instructive 
in this regard. In hydrogel samples, the degree of NAI protec- 
tion in ordered regions of the protein was roughly 3x that of de- 
natured samples. In cells, the quantitative degree of protection 
was roughly 1 .5x . The NAI footprint observed in freshly prepared 
liquid-like droplets yielded a quantitative degree of protection 
more closely matching to that of native hnRNPA2 as probed in 
isolated nuclei. 

Paradoxically, evidence of the existence of a low level of cross- 
p structure was seen in samples of the MBP:PTB:hnRNPA2 LC 
domain triple fusion protein before TEV release of MBP, before 
exposure to the synthetic RNA containing iterative PTB binding 
sites, and before formation of liquid-like droplets (Figure 5). As 
recently articulated by Arosio et al. (2015), the initial nucleation 
of amyloid fibers is triggered within micro-seconds of protein mix- 
ing during the lag phase of polymerization. Transition from lag to 
growth phase of amyloid polymerization reflects, of course, a pro- 
found enhancement of the proportion of molecules existing in the 
fibrous state. We interpret the observation of an NAI footprint in 
samples of the triple fusion protein before TEV cleavage and 
exposure to the synthetic RNA substrate, and before the forma- 
tion of liquid-like droplets, to reflect the same phenomenon of fi- 
ber nucleation observed during the lag phase of polymerization of 
pathogenic amyloid fibers. This interpretation is displayed graph- 
ically in Figure 6. 

We have never thought or contended that LC polymers 
thousands of subunits in length are operative in living cells. 



Indeed, LC domains are a cellular sink for 
post-translational modification, including 
phosphorylation, acetylation, methylation, 
glycosylation, and PARylation (Choudhary 
et al., 2009; Lee, 2012; Zhang et al., 2013). 
Knowing that phosphorylation can regu- 
late the polymerization of LC domains (Han et al., 2012), we 
have every reason to believe that the behavior of LC domain poly- 
mers will be far more dynamic in living cells than in the hydrogels 
we have been studying for the past several years. 

Despite recognizing hydrogels as being aberrantly static, they 
have offered a number of useful advantages. They have allowed 
us to probe for structure— first and foremost telling us that LC 
domain polymerization is at the heart of hydrogel formation 
(Kato et al., 2012). Second, they have given us assays to probe, 
in an unbiased manner, for both proteins and RNAs that bind to 
hydrogels (Han et al., 2012). Third, they have allowed us to 
conduct correlative mutagenesis experiments in search of muta- 
tions that affect hydrogel binding as compared with other cellular 
activities (Kwon et al., 2013). Fourth, they have allowed us to 
study interaction with LC domains that— on their own —cannot 
polymerize. These include the C-terminal domain of RNA poly- 
merase II (CTD) of RNA polymerase II and the serine arginine 
repeat (SR) domains of pre-mRNA splicing factors, both of which 
bind specifically to certain hydrogels in a manner regulated by 
the protein kinase enzymes known to control CTD and SR 
domain function (Kwon et al., 2013, 2014). Finally, in this report, 
we have employed hydrogels to develop the NAI footprinting 
strategy. 

Since the submission of this manuscript, four new papers have 
been published concerning the partitioning of LC domains into 
liquid-like droplets. Two of the four papers conclude that there 
is no biologic or physiologic role for cross-p polymerization of 
these LC domains and that polymerization is solely reflective of 
a pathologic state (Altmeyer et al., 2015; Patel et al., 2015). 
The two other papers conclude that cross-p polymerization of 
LC domains is not the driving force leading to the formation of 
liquid-like droplets but that it may be of biologic utility during 
the maturation of liquid-like droplets and/or RNA granules (Lin 
et al., 2015; Molliex et al., 2015). These four papers concur 
with the work of Elbaum-Garfinkle et al. (2015) and Nott et al. 
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(201 5) indicating that the primary biologic utility of LC domains is 
driven by forces other than cross-p polymerization, perhaps 
including 7r-stacking of arginine and phenylalanine residues or 
other forms of weak or “fuzzy” interactions involving unfolded 
polypeptide domains. 

Here, we offer the very different perspective that cross-p poly- 
merization commonly drives the formation of hydrogels, the 
retention of LC domains trapped by hydrogels, the formation of 
liquid-like droplets, the partitioning of LC domains into existing 
liquid-like droplets, and the formation and maturation of RNA 
granules. In other words, we submit the hypothesis that the 
involvement of LC domains in the formation of RNA granules, 
liquid-like droplets, and hydrogels all rely on one in the same 
phenomenon— cross-p polymerization. Further experimenta- 
tion, including derivation of the molecular structure of LC do- 
mains existing in the labile, polymeric state, should help resolve 
this controversy. 

EXPERIMENTAL PROCEDURES 

Detailed experimental procedures are available in the Supplemental 
Information. 

Materials 

N-acetylimidazole was purchased from Sigma-Aldrich (USA). Ring ^^Ce-tyro- 
sine was purchased from Cambridge Isotope Laboratories (USA). The parental 
vector for expression of the triple fusion protein of MBP:PTB:hnRNPA2 LC 
domain was provided by Dr. Michael Rosen of University of Texas South- 
western Medical Center. 

Preparation of Fusion Proteins 

HiSeiGFP or HiSeimCherry linked to the LC domain of wild-type hnRNPA2 (res- 
idues 181-341) was overexpressed alone or co-expressed with human PPIA 
and purified as described previously (Kato etal., 2012). Tyrosine-to-serine mu- 
tants of GFP:hnRNPA2 LC were purified in the presence of 2 M guanidine 
hydrochloride. 

Preparation of Heavy Proteins 

The stable-isotope-labeled (heavy) proteins (His6:hnRNPA2 LC domain and 
HIs 6:GST) were prepared with ring ^^Ce-tyrosine (labeled with ^^C on the six 
carbons of the phenyl ring) by following published procedures (Baxa et al., 
2007). 

Acetyiation of Recombinant Proteins 

To acetylate denatured (heavy) proteins, the proteins were denatured by 5 M 
GuSCN, and acetylation reactions were carried out with 30 mM NAI and 
1 mg/ml proteins. The reactions were quenched by 0.8 M Tris-HCI (pH 8.8). 
The light proteins were acetylated in native conditions (without GuSCN). The 
native and denatured proteins were mixed at a 1:1 ratio, digested by chymo- 
trypsin and analyzed by mass spectrometry. 

Acetyiation of Nuciei 

293T cells were cultured in DMEM high glucose media containing light or ^^C©- 
tyrosine, respectively. Intact nuclei from heavy or light cells were purified in hy- 
potonic buffer and washed with beta mercaptoethanol (BME) free buffer. Light 
intact nuclei were resuspended in a nuclei buffer, and heavy nuclei were dena- 
tured in the nuclei buffer with 5 M GuSCN. Both samples were acetylated with 
30 mM NAI at RT for 15 min, quenched by Tris, and mixed together at a 1 :1 
ratio. The mixture was digested by chymotrypsin and then analyzed by 
mass spectrometry. 

Acetylation of Liquid-like Droplets 

Liquid-like droplets of the MBP:PTB:hnRNPA2 LC triple fusion protein were 
prepared as described (Lin et al., 2015). For the native sample, NAI (30 mM) 



was added to the protein solution before TEV cleavage or after TEV cleavage 
at the indicated time points. After incubation for 15 min at room temperature 
(RT), the reaction was quenched by Tris. Acetylated Hise-tagged hnRNPA2 
LC (heavy) protein was used for the denatured sample. The two samples 
were mixed, digested by chymotrypsin, and then analyzed by mass 
spectrometry. 

Recruitment Assays with Hydrogels and Liquid-like Droplets 

Hydrogel droplets of mCherry:LC domain of wild-type hnRNPA2 were pre- 
pared as described (Kato et al., 2012). GFP:hnRNPA2 LC wild-type or mutant 
proteins were diluted to 1 ^iM in 1 ml of a gelation buffer and pipetted into the 
hydrogel dish. After overnight incubation at 4°C, horizontal sections of the 
hydrogel droplets were scanned at both the mCherry and GFP excitation 
wavelengths by a confocal microscope. GFP signals at a boundary area of 
the hydrogel droplets were scanned by the program Imaged (Schneider 
et al., 2012). Liquid-like droplets formed from MBP:PTB:hnRNPA2 LC and 
the synthetic RNA substrate were incubated with 0.1 ^iM of GFP:hnRNPA2 
LC wild-type or mutant domains. The droplets were deposited on a cover slide 
and imaged by a fluorescent microscope. GFP signals inside and outside of 
the liquid-like droplets were measured by the program Imaged. The partition 
ratio of GFP:hnRNPA2 proteins was calculated by dividing the signal inside 
the droplet by the signal outside. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures, 
four figures, and three tables and can be found with this article online at 
http://dx.doi.Org/10.1016/j.cell.2015.10.040. 
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SUMMARY 

Bacteria acquire memory of viral invaders by incor- 
porating invasive DNA sequence elements into the 
host CRISPR locus, generating a new spacer within 
the CRISPR array. We report on the structures of 
Cas1 -Cas2-dual-forked DNA complexes in an effort 
toward understanding how the protospacer is 
sampled prior to insertion into the CRISPR locus. 
Our study reveals a protospacer DNA comprising a 
23-bp duplex bracketed by tyrosine residues, 
together with anchored flanking 3' overhang seg- 
ments. The PAM-complementary sequence in the 3' 
overhang is recognized by the Cas1a catalytic sub- 
units in a base-specific manner, and subsequent 
cleavage at positions 5 nt from the duplex boundary 
generates a 33-nt DNA intermediate that is incorpo- 
rated into the CRISPR array via a cut-and-paste 
mechanism. Upon protospacer binding, Cas1-Cas2 
undergoes a significant conformational change, 
generating a flat surface conducive to proper proto- 
spacer recognition. Here, our study provides impor- 
tant structure-based mechanistic insights into 
PAM-dependent spacer acquisition. 

INTRODUCTION 

The clustered regularly interspaced short palindromic repeats 
(CRISPR), together with CRISPR-associated (Cas) proteins, 
form the microbial adaptive immune system that protects 
against invading phages and plasmids. The CRISPR array con- 
sists of identical short repeats interspaced by similarly sized 
variable spacers, which are acquired from the foreign DNA (Fig- 
ure 1A) (Barrangou et al., 2007; Barrangou and Marraffini, 2014; 
Brouns et al., 2008). An A-T-rich leader sequence located up- 
stream of the first repeat is essential for spacer acquisition (Yosef 
et al., 201 2) and promotes the transcription of the CRIPSPR array 
(Pougach et al., 201 0). The CRISPR-Cas system defends against 
invasive nucleic acids from phages or plasmids in three steps 
(van der Cost et al., 2014). First, in the spacer acquisition step 
(also called adaptation), a new spacer is acquired from the 



invader DNA and integrated into the CRISPR locus (Barrangou 
et al., 2007; Fineran and Charpentier, 2012). Second, the 
CRISPR locus is transcribed and processed into short mature 
CRISPR RNA (crRNA), which then binds to Cas proteins and 
forms a protein-RNA complex (Brouns et al., 2008). Finally, the 
invading nucleic acid complementary to crRNA is recognized 
and degraded by the protein-crRNA complex (Garneau et al., 
2010; Hale et al., 2009; Marraffini and Sontheimer, 2008). While 
the molecular mechanisms of expression and interference steps 
are now well characterized in molecular and functional terms, the 
adaptation step still awaits detailed analysis. 

Recent studies have shown that the protospacer-adjacent 
motif (PAM) is fundamental to avoid auto-immunity. Only if the 
invading DNA is flanked by the correct PAM can it be cleaved 
during interference (Deveau et al., 2008). Furthermore, it was 
shown that PAMs are of critical importance for recognition and 
selection of protospacer during acquisition. It was found that 
protospacers flanked by the correct PAM could be incorporated 
into the CRISPR array (Horvath et al., 2008; Mojica et al., 2009). 
Interestingly, in Escherichia coii, the last nucleotide of the new 
repeat is derived from the first nucleotide of the incoming spacer, 
and this nucleotide is indeed the last nucleotide of the PAM 
sequence (Datsenko et al., 2012). 

Cas1 and Cas2 are the only two Cas proteins universally 
conserved across all CRISPR-Cas systems (Makarova et al., 
2011). Previous in vitro analysis showed that Cas1 is a metal- 
dependent DNase, capable of cleaving single-stranded (ss) 
DNA, double-stranded (ds) DNA, cruciform DNA, and branched 
DNA in a sequence-independent manner (Babu et al., 201 1 ; Wie- 
denheft et al., 2009). Likewise, Cas2 was identified as a metal- 
dependent endoribonuclease that cleaves ssRNA or dsDNA 
(Beloglazova et al., 2008; Gunderson et al., 2015; Ka et al., 
2014; Nam et al., 2012) or, alternately, shows no significant 
nuclease activity (Samai et al., 201 0). However, one recent study 
demonstrated that the “active site” of Cas2 is not required for 
spacer acquisition (Nunez et al., 2014), suggesting that Cas2 
could play other as-yet unknown functions. 

Overexpression of E. coii Cas1 and Cas2 induces new 
spacer acquisition by inserting exactly 33 nt foreign DNA 
behind the first repeat, indicating that Cas1 and Cas2 are 
both necessary and sufficient for new spacer acquisition. Pre- 
vious studies demonstrated that Cas1 and Cas2 form a stable 
complex, which functions as an integrase that incorporates the 
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Figure 1. Crystal Structure (2.6 A) of E. coli Cas1-Cas2 Bound to a Dual-Forked DNA 

(A) A representation of the CRISPR-Cas locus of E. coli K12. The CRISPR locus consists of series of repeats (orange diamonds) that are separated by spacer 
sequences (red rectangles) of constant length. Casi and Cas2 are shown in magenta and green colors, respectively. 

(B) Schematic diagram of the dual-forked DNA, which is a 23-mer palindromic duplex with 5'-(T)6 and 3'-(T)-io overhangs on both ends. The nucleotides in the 5' 
overhangs are numbered from -6 to -1 ; those in the DNA duplex are numbered from 1 to 23; and those in the 3' overhang are numbered from 24 to 33. The two 
strands of DNA are colored in red and blue, respectively. 

(C) Structure of the dual-forked DNA in the Cas1-Cas2 complex. 

(D) Orthogonal views of the crystal structure of the complex of Cas1 -Cas2 bound to the dual-forked DNA. The Cas1 a and Cas1 a' are shown in light orange, and 
Casib and Casib' are show in magenta. Two monomers of Cas2 are in green and cyan, respectively. The proposed Arch segment is labeled. 

(E) The surface view of the Cas1-Cas2 dual-forked DNA complex in the same orientation as Figure ID, bottom. 

See also Figure SI and Table SI. 



new spacers into the CRISPR locus (Arslan et al., 2014; Nunez 
et al., 2014, 2015; Rollie et al., 2015). In E. coli, the integration 
process involves the staggered cleavage of the first CRISPR 
repeat, and new spacers are incorporated proximal to the 
leader sequence (Yosef et al., 2012). From this, three funda- 
mental questions arise as to how Cas1-Cas2 mediates the 
spacer acquisition. First, what are the physiological DNA sub- 
strates of Cas1-Cas2, and what are the respective roles of 
Cas1 and Cas2 proteins? Second, while the spacers are known 
to be of a set length in each species, what are the molecular 
mechanisms underlying spacer length determination? Third, 
how does the acquisition machinery select protospacers con- 
taining a PAM sequence? 



To understand the molecular mechanisms of spacer acquisi- 
tion, we determined the crystal structure of E. coli Cas1-Cas2 
bound with dual-forked DNA. Our structure highlights the 
following mechanistic principles related to new spacer acquisi- 
tion. We demonstrate that the protospacer DNA captured by 
Cas1-Cas2 adopts a dual-forked form, with the 3' overhangs 
of the protospacer essential for new spacer acquisition. The 
PAM-complementary sequence (5'-CTT-3'), located within the 
3' overhang, is recognized in a sequence-specific manner 
and is cleaved by Cas1a, generating a DNA intermediate that 
has 5-nt 3' overhangs on the two partner strands. Given that 
tyrosine residues cap either end of a 23-bp duplex, Cas1- 
Cas2 predetermines the length of the newly acquired spacer. 
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thereby highlighting the role of both Cas1 and Cas2 in the 
acquisition mechanism. Moreover, Cas1-Cas2 undergoes a 
significant conformational change upon protospacer binding, 
thereby generating optimal protospacer and target binding 
sites. 

RESULTS 

Crystal Structure of Cas1-Cas2 Bound to Single-Forked 
DNA 

Both Cas1 and Cas2 are capable of cleaving various types of 
DNA in vitro. However, the exact DNA substrate of the Cas1- 
Cas2 in vivo has remained unknown. To obtain a crystal of the 
Cas1-Cas2-DNA complex, we co-crystallized the protein com- 
plex with various DNAs. As shown in Figure S1A, initially only 
the single-forked DNA containing a 10-bp duplex and 3' and 5' 
oligo-T overhangs of 10-nt length crystallized, resulting in a 
low-resolution structure of this complex at 4.5 A. 

Search and Optimization of the DNA Substrate 

In terms of nomenclature, within each symmetric half of the com- 
plex, the proteins are labeled Cas1a, Cas1b, and Cas2 and 
Cas1a', Cas1b', and Cas2'. Analysis of our structures showed 
that this complex contains a pair of Cas1 dimers sandwiching 
one Cas2 dimer (Figure S1B), similar to the structure of DNA- 
free Cas1-Cas2 (Nunez et al., 2014). In this 2-fold symmetric 
complex, the two single-forked DNAs lie on the surface of the 
Cas1-Cas2 in a head-to-head orientation. Each 10-bp duplex 
lies on the interface of a Cas1a/b dimer, with the fork facing 
toward the edge of the Cas1 a/b dimer and the duplex end posi- 
tioned on the Cas1-Cas2 interface. These findings strongly indi- 
cate that the two DNA forks always face toward the outside of 
Cas1-Cas2, suggesting that this orientation of the forks is fixed 
in the protein complex. 

While the two forks are facing outward, the blunt ends of both 
duplexes extend toward the center, where the Cas2 dimer is 
located. Interestingly, the blunt ends do not meet but leave a 
gap in between, indicating that Cas1-Cas2 associates with 
duplex DNA longer than 20 bp. To test this assumption, we 
used various substrates, including single-fork DNA containing 
either 1 1 - or 1 2-bp duplexes and dual-forked DNA with duplexes 
of 21-24 bp in length, flanked by 3' and 5' overhangs at both 
ends. To our surprise, the complex with dual-forked DNA sub- 
strates resulted in crystals with greatly improved diffraction, 
from which we obtained a structure of the complex at a higher 
resolution of 2.6 A. This result suggests that this dual-forked 
DNA is closely related to the in vivo substrate used by Cas1- 
Cas2. 

Dual-Forked DNA Is the Substrate of Cas1-Cas2 

Having found a DNA substrate yielding a high-resolution struc- 
ture of the complex, we found that a dual-forked DNA substrate 
of 23-bp duplex length flanked by 3'-terminal (T)io and 5'-termi- 
nal (T)q overhangs (Figure 1 B) gave crystals that diffracted to the 
highest resolution. The structure of the complex was refined at 
an Rwork/Rfree of 0.179 and 0.207 (Table S1). The asymmetric 
unit contains one Cas1-Cas2-DNA complex, which possesses 
a pair of asymmetric Cas1 dimers (Cas1a/b and CaslaVb') and 



one symmetric Cas2 dimer, together with one dual-forked DNA 
substrate (Figures 1C, 1D, and S1C). The entire Cas1-Cas2- 
DNA complex exhibits 2-fold symmetry, with each half 
composed of Cas1a, Cas1b, and Cas2 subunits and bound 
DNA substrate. 

In detail, the pair of symmetric Cas2 subunits are sandwiched 
between the pair of asymmetric Cas1 dimers (Figure 1 D), similar 
to the single-forked DNA-bound Cas1-Cas2 complex (Fig- 
ure S1 B). The Cas1 a/b dimer is structurally similar to its symme- 
try-related Cas1 aVb' dimer counterpart, with Cas1 a being similar 
to Cas1a' and Cas1b similar to Cas1b'. Cas1-Cas2 is shaped like 
a wings-down butterfly, containing one flat top surface and an 
arch-shaped surface on the opposite face (Figures 1D, top, 
and S1 D). In our structure, 1 4 amino acids at the N-terminal tails 
of Cas1 a and Cas1 a' and ~40 amino acids at the C-terminal tails 
in both Cas1 subunits were disordered. 

Within our crystal structure of the complex, the designed 
DNA features visible forks at either end, with a 23-bp duplex 
sandwiched between fork elements. The dual-forked DNA lies 
on the flat surface of Cas1-Cas2, and the two 3' overhangs 
thread into the C-terminal domains of Cas1a and Cas1a', 
respectively (Figure 1 E). We observe a multitude of intermolec- 
ular interactions between the 3' overhangs and the protein, 
further indicating that the dual-forked DNA is a robust substrate 
for the cleavage reaction by Cas1-Cas2, as discussed further 
below. 

The DNA Duplex Segment Slots into the Flat Surface 
Provided by Cas1-Cas2 

Next, we investigated the interaction between the DNA and the 
protein in the complex in greater detail. The 23-bp duplex 
closely follows the contours of the flat surface at the top of 
Cas1-Cas2, starting from CaslaVb' at one end, reaching across 
to Cas1a/b at the other end, and interacting with intervening 
Cas2 along its path (Figure 1 E). Comparison of the duplex in 
the dual-forked DNA with the canonical B-form duplex DNA 
shows that the interaction between the duplex and Cas1-Cas2 
induces bending of the DNA (Figure S2A). As shown in Figure 2A, 
either end of the duplex straddles the Cas1 dimer interface. In 
this region, the duplex forms hydrogen bonds via its phosphate 
groups with Arg59, Arg245, and Arg248 of Cas1a' and Val27, 
Asp29, Gly30, and Ser61 of Cas1 b' (Figures 2B and 2C). The 
last four base pairs (positions 19-23) of the duplex segment 
are stabilized by the Cas1a/b dimer in a similar manner to that 
observed for the symmetry-related first four base pairs (posi- 
tions 1-4). 

The central segment of the duplex lies on the surface of the 
Cas2 dimer and is stabilized by charge-charge interactions via 
its phosphate backbone with the positively charged Cas2 sur- 
face (Figure S2B). As shown in Figure 2D, the side chains 
involved in these interactions are from Arg14, Arg16, Arg77, 
and Arg78, together with the main chain of Asn1 0. Individual sub- 
stitutions of these Arg residues by Ala and the double mutant of 
Arg77Ala and Arg78Ala reduced spacer acquisition. In addition, 
no new spacer acquisition was observed for Arg14Ala and Ar- 
g16Ala dual mutant (Figure 2E). Together, these results indicate 
that the interactions between Cas2 and duplex DNA are crucial 
for spacer acquisition. 
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Figure 2. Positioning of Dual-Forked DNA onto Cas1-Cas2 

(A) One terminus of the duplex straddles the Casi dimer interface. 

(B and C) Detailed view of the interaction between Casi dimer and DNA duplex. 

(D) Detailed view of interaction between Cas2 dimer and DNA duplex. 

(E) Agarose gel of in vivo acquisition assays involving mutations of duplex-binding Cas2. WT, wild-type. 

(F) Tyr22 residues from Casi a and Casi a' bracket the 23-bp duplex, which is positioned on the flat surface of Cas1-Cas2. 

(G) A simplified view (with Cas proteins removed) of the DNA 5' and 3' overhangs at one end of the complex. 

(H) 3' overhang lies in the groove of the C-terminal domain of Cas1a shown in surface view representation. The phosphate groups are shown in yellow. 
Nucleotides 28-30 are labeled with a green background, with the cleavage site shown by a red arrow. 

(I) Magnified view of the interaction between nucleotides 24-26 and Casi . 

(J) Magnified view of the interaction between nucleotides 27-28 and Casi . Glul 41 , His208, and Asp221 are the catalytic residues of Casi . The DNA cleavage site 
is indicated by a red arrow. 

See also Figure S2. 
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Two Tyrosine Residues Determine the 23 nt Length 
of the Bracketed Duplex Segment 

Next, we investigated what specific interactions with the protein 
determine the length of the duplex segment in the complex. As 
shown in Figures 2B and 2F, the first base pair of the duplex 
stacks on the side chain of Tyr22 of Cas1a', and the last base 
pair stacks on the Tyr22 of Cas1a. Such bracketing by the tyro- 
sines prevents additional base pairs from participating in the 
duplex structure, with the tyrosines in addition serving as 
wedges that generate duplex single-strand junctions (Figure 2B). 
Thus, these two tyrosines from the symmetry-related Cas1a 
subunits serve as a caliper to measure a 23-bp duplex segment 
of the bound DNA (Figure 2F). In the case of this E. coli Cas1- 
Cas2-DNA complex structure, the distance between these two 
Tyr22 residues is 76 A, creating a ruler that fits a B-form DNA 
duplex with the length of 22-23 bp. 

To investigate whether the distance between the two Tyr22 
residues is a function of the length of the duplex, we analyzed 
additional structures containing DNAs of shorter duplex length. 
Contrary to our expectations, the length of the duplex found in 
the structure of Cas1-Cas2 bound to the dual-forked DNA with 
22-bp duplex was not 22 bp but, rather, 23 bp, identical to the 
complex containing 23-bp duplex dual-forked DNA discussed 
above (Figures S2C-S2E). Thus, the assembly of the Cas1 and 
Cas2 complex forms the basis for the two side chains of Tyr22 
residues from Cas1a and Cas1a' to work together as a ruler 
that defines the precise length of the duplex. In type l-E 
Cas1, Tyr22 is conserved to a certain extent, being always a 
planar/large side-chain residue (such as Flis or Arg), which 
could possibly also stack with the base pairs at both ends 
(Figure S3). Together, these observations strongly suggest 
that the duplex length is not simply a result of our DNA design 
but is a function of the intrinsic properties of the Cas1-Cas2- 
DNA complex. This explains how Cas1-Cas2 provides a ruler 
that measures with great precision the length of the DNA 
duplex. 

3^ Overhangs Thread through the C-Terminal Domains 
of Cast a 

As the two Tyr22 residues act as wedges between the duplex 
and overhangs at the fork site, they cause a flip of the 3' over- 
hangs away from the duplex (Figure 2G). As a consequence, 
the 3' overhangs thread through the C-terminal domain of 
Cas1a (Figure 2H) in a similar manner at both ends of the com- 
plex. The 10-nt 3' overhangs (numbered 24-33) adopt an irreg- 
ular curve-line conformation and form extensive intermolecular 
interactions with the C-terminal domains of Cas1a (Figure 2H). 
Nucleotide 24 flips away from the duplex, with its phosphate 
groups stabilized by hydrogen bonding to residues Glu80 and 
Tyr86 of Cas1 a (Figure 2I). Nucleotide 25 is stabilized via stack- 
ing on the side chain of Trp3 of Cas1 b, with further stabilization 
via interaction of its phosphate group with Arg84 of Cas1a (Fig- 
ure 2I). Nucleotides 26-28 are stabilized via interactions with res- 
idues Trp170, Arg163, Thr184, Tyr188, His208, and Tyr217 of 
Cas1a (Figures 2I and 2J). Thus, these intermolecular interac- 
tions stabilize the bound single-stranded 3' overhangs at either 
end, which is likely to be a pre-requisite for proper cleavage 
function of Cas1 (see below). 



PAM Recognition 

The molecular basis for the selection of the protospacer remains 
unknown. In E. coli, spacers are chosen from protospacer con- 
taining a 5'-AAG-3' PAM sequence, and it was shown that the 
protospacer is cleaved between G-1 and A-2 within the PAM 
and that G-1 is inserted along with the protospacer (Datsenko 
et al., 201 2; Goren et al., 201 2; Swarts et al., 201 2). In our struc- 
ture, the cleavage is found between nucleotides 28 and 29 as 
described later, suggesting that nucleotides 28-30 in the 3' over- 
hang are complementary to the PAM sequence. Therefore, 
in vivo, these three nucleotides in the overhang should contain 
the sequence 5'-CTT-3', as this is complementary to the PAM 
5'-AAG-3' sequence. 

To provide insights into the molecular mechanism of PAM 
recognition by Cas1 , we next determined the crystal structure 
of E. coli Cas1 -Cas2 bound to DNA containing the PAM-comple- 
mentary 5'-CTT-3' sequence (Figures 3A-3C and Movie S1) 
instead of the original oligo-T sequence at positions 28-30. 
The overall structure of the PAM-complementary-containing 
complex is similar to the oligo-T-containing complex, though 
there are some important differences. Therefore, we will discuss 
below the PAM-complementary bound region, as well as those 
regions that differ between the PAM-complementary and oligo- 
T-bound structures of the complex. Given that the two 3' over- 
hangs bearing the PAM-complementary sequence insert into 
the C-terminal domain of Cas1 a and Cas1 a' in the same manner, 
we will describe only the structural features of the 3' overhang 
bound to Cas1a. 

As shown in Figure 4A, seven nucleotides were visible at the 3' 
overhang, where they adopt a hook-shaped curve and meander 
through the C-terminal domain of Cas1a. Nucleotides 24-27 are 
stabilized by Cas1a in the PAM-complementary-containing 
complex, in a manner similar to that observed in the oligo-T-con- 
taining complex described above. Nucleotides C28, T29, and 
T30 are positioned orthogonally to each of their preceding nucle- 
otides and fit into a binding pocket provided by the C-terminal 
domain of Cas1a and the C-terminal tail of Cas1b. It is clear 
from the PAM-complementary-containing complex structure 
that this pocket is base specific for the CTT sequence. The 
nucleotide C28, which is complementary to the conserved G in 
the PAM sequence, is read out by two base-specific 
hydrogen-bonding interactions. The Watson-Crick edge of C28 
forms a hydrogen bond with the side chain of Lys21 1 of Cas1a 
and with the non-bridging phosphate oxygen of nucleotide 27 
(Figure 4B). The pyrimidine ring of C28 is further stabilized as a 
result of being sandwiched between the side chains of Tyr217 
(Cas1a) and Ne291 (Cas1b) residues. The base of T29 is flexible 
in the oligo-T-containing structure. By contrast, in the PAM- 
complementary-containing complex, the base of T29 stacks on 
the side chain of Gln287 of Cas1b, with its Watson-Crick edge 
forming a base-specific hydrogen bond with the backbone oxy- 
gen of Arg1 38 of Cas1 a. Further, the non-bridging phosphate ox- 
ygen atoms of T29 form hydrogen bonds with the side chains of 
His208 from Cas1 a and Gln287 of Cas1 b (Figure 4C). T30, whose 
base stacks on the side chain of Tyr165, is also recognized in a 
sequence-specific manner by forming hydrogen bonds involving 
its Watson-Crick edge with the main chain of Tyr1 65 in the PAM- 
complementary-containing complex (Figure 4C). 
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Figure 3. Crystal Structure of E. coli Cas1-Cas2 Bound to a PAM-Complementary Dual-Forked DNA 

(A) Schematic diagram of the PAM-compiementary duai-forked DNA, which is a23-mer paiindromic dupiex with 5'-(T)6and 3'-(T)io overhangs on both ends. The 
PAM -complementary sequence 5'-CTT-3' is highlighted by the green background. 

(B) Fo-Fc omit map (gray color, contoured at 3.0 a) of the nucleotides 26-30 in the structure with the PAM-compiementary sequence within the 3' overhangs. 

(C) Orthogonal views of the crystal structure of the complex of Cas1-Cas2 bound to the PAM-compiementary dual-forked DNA. 

See also Figure S3 and Movie SI . 



To investigate how the base-specific interaction between 
Lys21 1 and C28 is related to conservation of the G residue, 
which is present at the 5' end of most of the newly acquired 
spacers, we sequenced newly acquired spacers within either 
wild-type or the Lys211Ala Cas1 mutant. We found that, in the 
wild-type Cas1, ~76% new spacers are flanked by a 5' G, 
whereas it is reduced to 47% in the Lys21 1 Ala mutant. The Wat- 
son-Crick edge of C28 is recognized in a sequence-specific 
manner via two hydrogen bonds. Removing one base-specific 
interaction with C28 by substituting Lys21 1 with Ala markedly 
decreased the degree of G conservation (Figure 4D). Thus, the 
interaction between the bases of C28 and Lys21 1 is important 
for the insertion of the conserved G. 

Single-Stranded Nature of the 3^ Overhang Is Critical for 
New Spacer Acquisition 

To test the significance of the 3' -terminal single strand and the 
PAM-compiementary sequence, we conducted electrophoretic 
mobility shift assays (EMSA). As shown in Figure 4E, the pres- 
ence of 3' overhangs significantly increases the binding affinity 
between Cas1-Cas2 and DNA. Cas1-Cas2 binds blunt-end dou- 
ble-stranded DNA with lower affinity than dual-forked DNA. Us- 
ing a DNA duplex flanked by a 4-nt 3' overhang at both ends 
moderately increased the affinity for Cas1-Cas2. However, the 
binding affinity increased significantly upon extension of the 3' 
overhang by either 7 or 1 0 nt, with no further change on proceed- 
ing from 7-10 nt. By contrast, weak binding was observed when 
the DNA substrate contained 10-nt 5' overhangs (Figure S4A), 
implying a modest contribution to binding from the 5' overhang. 
Most importantly, the binding is much stronger when the 7-nt 3' 



overhang contains the PAM-compiementary 5'-CTT-3' 
sequence (Figure 4E) compared to 5'-TTT-3', 5'-TCC-3', and 
5'-GAA-3' sequences (Figure 4F), establishing that 5'-CTT-3' of 
the PAM-compiementary sequence is crucial for high-affinity 
protospacer binding by Cas1-Cas2. 

Impact of DNA-Binding Casi and Cas2 Mutants on 
Complex Formation 

As shown in Figure 4A, the 3' overhangs are located within the 
C-terminal domain of Cas1a, where they are stabilized by 
numerous intermolecular interactions (Figure 5A). With the 
exception for the PAM-compiementary sequence, the 3' over- 
hangs bind to the Cas1 dimer mainly through non-sequence- 
specific interactions. Aromatic residues Tyr165, Trp170, and 
Tyr 217 on Cas1a are involved in stacking interactions with the 
bases of the 3'-overhang segment. We observe in an EMSA 
assay a modest decrease in binding affinity for the alanine- 
substituted Tyr165 and Trp170 dual mutant, while a more pro- 
nounced decrease is observed for the Tyr165 and Tyr217 dual 
mutant (Figure 5B, top), with the latter two involved in comple- 
mentary-PAM recognition (Figures 4B and 4C). In addition, a 
significant reduction in binding affinity is observed for alanine- 
substituted Arg14 and Arg16 dual mutant (Figure 5B, bottom), 
consistent with these Cas2 residues involved in intermolecular 
recognition with the duplex segment (Figure 5A). 

Impact of DNA-Binding Casi Mutants on In Vivo Spacer 
Acquisition 

Tyrosine residues 165, 188, and 217, as well as Lys211 on 
Cas1a, are involved in intermolecular recognition of the 
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Figure 4. PAM-Complementary Segment Recognition 

(A) The 3' overhang containing the PAM-compiementary sequence motif iies in the groove of the C-terminai segment of Casi a and covered by the C-terminai taii 
of Casib. The nucieotides compiementary to the PAM are iabeied by green background. 

(B and C) The detaiied sequence-specific interactions between Casi and C28 (B) and T29 and T30 (C) residues. The DNA cieavage site is indicated by a red arrow 
in A and C. 

(D) Sequence iogos obtained after the aiignment of the first ten nucieotides of the new insertion. Numbers indicate the positions of the nucieotide of the new 
insertion. Number of sequences used in each aiignment is indicated as n. 

(E) Eiectrophoretic mobiiity shift assay using 5' Cy3-iabeied doubie-stranded DNA-containing 23-bp dupiex and the 23-bp dupiex with 4-, 7-, or 10-nt 3' 
overhangs on both ends. The 23-7-CTT and 23-10-CTT DNAs harbor the PAM-compiementary sequence 5'-CTT-3', as shown in Tabie S2. 

(F) Eiectrophoretic mobiiity shift assay using 5' Cy3-iabeied non-PAM-compiementary DNAs with 23-bp dupiex and 7-nt 3' overhangs. The PAM-compiementary 



sequence 5'-CTT-3' was repiaced by 5'-TCC-3', 5'-GAA-3', or 5'-TTT-3'. 

See aiso Figure S4. 

PAM-complementary sequence of the 3' overhang in the 
Cas1 -Cas2-DNA complex (Figures 4B, 4C, and 5A). Replace- 
ment of individual Tyr165, Tyr188, and Tyr217 by alanine re- 
sulted in significant reduction in spacer acquisition in an in vivo 
assay, while a modest reduction was observed for the Ly- 
s211Ala mutant, as shown in Figure 5C, top. Interestingly, 
Tyr22, which is involved in bracketing the duplex segment 
(Figure 2F), shows only a modest decrease in spacer acquisi- 
tion on replacement by alanine (Figure 5C, top). This was 
unanticipated but may reflect the dominant role of intermolec- 
ular interactions involving the 3'-overhang segment to genera- 
tion of the duplex single-strand junction, as reflected in loss of 
spacer acquisition for the Arg245Ala and Arg248Ala dual 
mutant (Figure 5C, bottom) that is positional at the junctional 
site (Figure 5A). 



Identification of the Cleavage Site within the 
3 -Overhang Segments 

The nuclease activity of Cas1 is crucial for new spacer acquisi- 
tion, with conserved residues His208, Glu141, Asp221, and 
Asp21 8 crucial for this function (Nunez et al., 201 4). In our struc- 
ture of the complex, the phosphate group of nucleotide 29 is 
positioned adjacent to the side chains of Flis208, Glu141, and 
Asp221 that line the catalytic pocket, with the side chain of 
His208 forming a hydrogen bond with the phosphate group of 
T29 (Figure 5D). This suggests that Cas1 cleaves the phospho- 
diester bond between nucleotides 28 and 29, resulting in a 
DNA cleavage product that contains a 5-nt 3' overhang (Fig- 
ure 5E). We thus performed a cleavage assay using a 23-bp 
duplex DNA flanked by 10-nt 3' overhangs at either end. As 
shown in Figure S4B, the cleavage product is indeed 5 nt shorter. 
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Figure 5. The C-Terminal Domain of Cas1a Recognizes the PAM-Complementary Sequence 

(A) A schematic listing intermolecular contacts in the crystal structure of Cas1-Cas2 bound to a palindromic dual-forked DNA. 

(B) Electrophoretic mobility shift assay using 5' Cy3-labeled 23-bp duplex with 7-nt 3' overhangs (DNA 23-7-CTT), involving mutations of Casi (top) or Cas2 
(bottom). 

(C) Agarose gels of in vivo acquisition assays involving mutations of Casi . 

(D) Zoomed-in view of the catalytic site with nucleotides 28 and 29 located in the catalytic pocket. The DNA cleavage site is highlighted by a red 
arrow. 

(E) Schematic diagram of Casi cleavage product. 

(F) The C-terminal tail of Casib, which is shown in stick representation and magenta mesh density, covers the catalytic pocket of Casi a. 

See also Figure S5. 



confirming the proposed cleavage site. Here, seven nucleotides 
within the 3'-terminal overhangs are observed in our structure, 
suggesting that Cas1-Cas2 binds an intact substrate. Another 
residue, Asp21 8, was previously thought to be a catalytic residue 
(Babu et al., 2011). However, in our structure, it is positioned 
away from the catalytic pocket and does not directly contact 
the DNA substrate. Instead, it stabilizes the alignment of the 
conserved catalytic residue His208 via a hydrogen bond 
(Figure 2J). 



PAM-Complementary Sequence Stabilizes C-Terminal 
Tail of Casib 

We compared the structures of PAM-complementary-containing 
complex and oligo-T-containing complex to highlight the confor- 
mational change upon binding of the PAM-complementary 
sequence. As shown in Figure S5A, the overall structures of 
these two complexes are similar, though there are distinct differ- 
ences. Thus, in the complex containing oligo-T DNA, the proline- 
rich C-terminal tails of Casib and Casib' are disordered. By 
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contrast, in the PAM-complementary-containing complex, the 
C-terminal tails of Cas1b and Cas1b' are well ordered and are 
involved in the binding of the PAM-complementary sequence 
(Figures 4B and 4C). In the PAM-complementary-containing 
complex, the loop containing the residues 278-305 of Cas1b 
covers the catalytic pocket of Cas1 a, similar to a lid-like topology 
(Figures 5F and S5B). Residues Ne291 and Gln287 in the C-ter- 
minal tail are involved in the interaction with the PAM-comple- 
mentary sequence (Figures 4B and 4C), suggesting that the 
interactions between the PAM-complementary sequence and 
the C-terminal tail of Cas1 b stabilize the fold of the latter. Inter- 
estingly, in the DNA-free complex (PDB: 4P6I), the C-terminal 
tail of Cas1b is ordered and spans Cas2 (Figure S5C) (Nunez 
et al., 2014). In the PAM-complementary-containing complex, 
the C-terminal tail of Cas1 b does not span Cas2 any longer but 
covers the catalytic pocket of Cas1a (Figure S5B). 

The Conformational Changes of Cas1-Cas2 upon 
Protospacer Binding 

To investigate whether the binding of the protospacer causes 
structural rearrangements of Cas1-Cas2, we performed 
comparative superposition analysis. Comparison of the DNA- 
free (Figure S6A) and DNA-bound (Figure S6B) structures reveals 
that the protospacer binding triggers large structural rearrange- 
ments in Cas1-Cas2. The Cas1-Cas2 in its DNA-free state 
adopts a “wings-up” butterfly-shaped configuration, in which 
the four Cas1 monomers represent the wings and the Cas2 
dimer represents the body (Figure S6C, left). Superposition of 
the Cas2 dimer of the free and DNA-bound structures shows 
that the two Cas1 dimers rotate in either clockwise (Cas1a/b) 
or anti-clockwise (Cas1 aVb') directions upon complex formation 
(Figures 6A, S6A, and S6B), similar to butterfly wings dropping 
into a spread-out position (Figure S6C, right). This conforma- 
tional change of the Cas1-Cas2 likely facilitates new spacer 
incorporation into the CRISPR locus. First, this rotation results 
in the generation of a flat protein surface for binding the duplex 
segment of the bound DNA (Figure 1 D). Second, this rotation re- 
positions the two tyrosine residues from Cas1a and 1a' into 
forming a bracket that precisely spans the full duplex length (Fig- 
ure S6D). Third, the rotation and loop (residues 163-174 of 
Cas1 a) movement results in the formation of an optimal catalytic 
pocket within Cas1a, allowing site-specific cleavage (28-29 
step) within the 3' overhang (Figure 6B). Fourth, it creates a 
deep arch-shaped surface on the opposite face of the duplex- 
binding surface (Figure 1D). 

To understand what induces the conformational change of 
Cas1-Cas2 upon protospacer binding, we superimposed either 
Cas2 or Cas1 b' in their DNA-free and DNA-bound states (Figures 
6C and 6D). As shown in Figure S6E, two antiparallel p strands 
(36-P7) of Cas2 interact with Cas1 b. A comparison of Cas2 struc- 
tures in the DNA-bound and DNA-free Cas1-Cas2 (PDB: 4P6I) 
shows that p6-p7 of Cas2 undergoes a significant conforma- 
tional change (Figure 6C). Upon protospacer binding, Arg77 of 
Cas2, which is positioned in the loop linking p6-p5, flips by 180 
degrees, allowing formation of an interaction with the DNA 
duplex (Figure 6C). The downstream residue Arg78 is also 
involved in duplex DNA binding (Figure 2D). Together, as a 
consequence of these interactions, the p6-p7 sheet moves 



away (see yellow arrow. Figure 6C) from the core ferredoxin 
fold of Cas2. 

Next, we compared the structures of the Cas1 -Cas2 interface 
by superimposing Cas1 b' within the DNA-free and DNA-bound 
complexes. With Cas1b' well superposed, Cas2 and Cas1a/b 
rotate away from the DNA-binding interface, as indicated by 
the yellow arrow (Figures 6D, S6F, and S6G). Interestingly, p6- 
P7 of Cas2 also superposed well along with Cas1 b' during this 
superimposing of free and bound states (Figure 6D), suggesting 
that the binding of the protospacer does not affect Cas1-Cas2 
interaction and that the loop linking p6 and the core ferredoxin 
fold of Cas2 plays an essential role in the hinge-mediated move- 
ment upon protospacer binding. 

DISCUSSION 

In this structural study, we reveal the precise nature of the DNA 
substrate of Cas1-Cas2. Furthermore, we provide evidence that 
the structural properties of this complex are the basis for the 
strict length requirements observed for newly acquired spacers 
incorporated into the CRISPR array. Lastly, we identify the 
mechanisms behind the selection of the protospacer sequence, 
namely by Cas1-Cas2 recognizing the PAM-complementary 
sequence in the invading DNA. 

Casla and Casib Subunits Perform Different Functions 
during Acquisition 

Cas1 proteins are asymmetrical homodimers, whereby two Cas1 
monomers forming the dimer adopt different conformations, in 
relation to the relative orientations between the N- and C-termi- 
nal domains (Figure 6E). The asymmetry of the Cas1 dimer was 
also observed in DNA-free E. coli Cas1 -Cas2 (Nunez et al., 201 4) 
and in DNA-free Cas1 dimers from other organisms (Babu et al., 
2011; Kim et al., 2013; Wiedenheft et al., 2009). This indicates 
that it is a common feature of Cas1 that its two monomers within 
the dimer adopt different conformations, which implies that 
these two monomers are likely to have different biological 
functions. 

As shown in Figure 4A, the 3' overhang inserts into the C-ter- 
minal domain of Casla and threads through the catalytic site. 
The 5' overhangs interact with the C-terminal domain of 
Casib or Casib' that belong to two neighboring symmetric 
complexes in the crystal lattice (Figure S6H). However, it is un- 
clear whether this latter structural feature results from complex 
formation or from crystallographic packing of another complex 
next to the 5' overhangs. Nevertheless, the possibility can be 
excluded that the 5' overhangs bind to Casib or Casib'. In 
our structures, Casib and Casib' form contacts on either 
side of the Cas2 dimer, while no contacts are observed between 
Casla or Casla' with the Cas2 dimer. Arg245 and Arg248 in 
Casib are involved in interaction with Cas2' (Figure S6E), 
whereas these residues in Casla interact with the DNA duplex 
(Figure 2B). Together, each asymmetrical Cas1 homodimer 
possesses one catalytic subunit (Casla and Casla'), which 
generates a 3'-OH group following cleavage and for recognition 
of the PAM-complementary sequence to select the proto- 
spacer, and one subunit (Cas1 b and Cas1 b'), which is respon- 
sible for forming Cas1-Cas2. Thus, our structure sheds light 



Cell 163, 1-14, November 5, 2015 ©2015 Elsevier Inc. 9 




Please cite this article in press as: Wang et al., Structural and Mechanistic Basis of PAM-Dependent Spacer Acquisition in CRISPR-Cas Sys- 
tems, Cell (2015), http://dx.doi.0rg/IO.IOI6/j.cell.2Oi5.IO.OO8 



Cell 





3'-overhang 



G 



Size (bp) 
700 
600 
500 
400 , 

300 















^ 0 ^= ,4^ "?r 






<j.i' .p p 



p- 




} expanded 
«- parental 



Figure 6. Conformational Change of Cas1-Cas2 upon Formation of Protospacer-Bound State and Function of Cas1 and Cas2 Proteins 

(A) Structural comparison between Casi -Cas2 in the protospacer-bound and DNA-free (PDB: 4P6I) structures. The Cas2 protein is superimposed. Vector length 
correlates with the domain motion scale. The red arrows indicate domain movements within Cas1-Cas2 complex upon protospacer binding. 

(B) The loop from residues 163 to 174 adjacent to the catalytic pocket undergoes a conformational change upon binding of the 3' overhang bearing PAM- 
complementary sequence (note the shift from silver to orange representations). The stacking interactions are highlighted by black double-edged arrows. 

(C) Structural comparison of Cas2 in the protospacer-bound Cas1-Cas2 complex (in cyan) and DNA-free Cas1-Cas2 structure (in silver). There is good su- 
perposition for the core ferredoxin fold of Cas2. The yellow arrow indicates the movement of p6-p7 of Cas2. Residue R77, which undergoes a significant 
conformational change, is shown in a stick representation. 

(D) Structural comparison of Casib in the protospacer-bound Cas1-Cas2 complex and DNA-free Cas1-Cas2 structure. There is good superposition for Cas1b 
and (36-(37 of Cas2. The yellow arrow indicates the movement of the core fold of Cas2. The red arrow indicates the movement of the loop linking (36 and the core 
fold of Cas2. 

(E) Superposition of the catalytic domain of Casi a (light orange) and Casi b (magenta). The yellow arrow shows the conformational difference of the N-terminal 
domain. 

(F) The arch-like structure may involve a binding site for the target DNA within its positive charged patches highlighted by a black box. The Casi -Cas2 complex is 
shown as a surface representation and is labeled according to its electrostatic potential (red, negative charge; blue, positive charge). The DNA is shown as yellow 
spheres. 

(G) In vivo acquisition assay with potential Casi and Cas2 mutations positioned within the postulated target DNA binding sites. 

See also Figure S6. 
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on the question of why Cas1 dinners are asymmetric, with the 
subunits fulfilling two different functions. 

Function of Cas2 

Our structures of the complexes also shed light on the role of 
Cas2 during CRISPR adaption. The Cas2 dimer bridges two 
Cas1 dimers, forming Cas1 -Cas2, which then provides the bind- 
ing surface for the protospacer DNA. Together with two Cas1 di- 
mers, Cas2, acting as a space holder, measures the length of the 
duplex by ensuring that the Tyr22 residues of Cas1 a and Cas1 a' 
are positioned exactly 23 nt apart from each other (Figure 2F). 
Moreover, the Cas2 dimer plays crucial roles in stabilizing the 
bound duplex DNA by forming hydrogen bonds with the back- 
bone of the DNA duplex (Figure 2D). Also, opposite to the duplex 
binding surface of Cas2 is an arch-like structure, which is likely to 
be involved in recognition of the target DNA, based on our obser- 
vation that the arch topology contains positively charged 
patches formed by residues Lys38 and Arg40 of Cas2 and 
Arg256 and Lys259 of Cas1b (Figure 6F). Notably, Lys38Ala 
and Arg40Ala (Cas2) dual mutant significantly reduced spacer 
acquisition, while no insertion was observed for the Arg256Ala 
and Lys259Ala (Cas1) dual mutant (Figure 6G). Flowever, further 
studies will be required to verify the target DNA binding site. 
Thus, the Cas2 dimer acts as an adaptor protein, bringing two 
Cas1 dimers together while stabilizing and measuring the length 
of the protospacer DNA, as well as binding to the target DNA. 

Cas1-Cas2 Predetermines the Length of the 
Protospacer 

Our structural analysis revealed that the most promising sub- 
strate of Cas1-Cas2 is composed of a dual-forked DNA, which 
contains both a double-stranded duplex and 3' single-stranded 
overhangs on both ends. Importantly, the site of interaction 
involving the catalytic residues with the DNA is 5 nt away from 
the end of the duplex (Figure 5D). Thus, the putative DNA frag- 
ment contains 23 nt of the duplex region, as well as 5-nt 3' over- 
hangs at both ends, resulting in a total distance of 33 nt from one 
cleaved 3' end to the other (Figure 5E). This finding is consistent 
with a recently proposed model (Nunez et al., 2015), which sug- 
gests that Cas1 -Cas2 inserts the invading DNA into the CRISPR 
locus like an integrase, with the length of the newly acquired 
spacer in the CRISPR locus depending on the 3' ends of the 
two strands of the protospacer DNA. Therefore, our structures 
of the Cas1-Cas2-DNA complex most likely represent the 
Cas1 -Cas2-protospacer-containing DNA complex. These struc- 
tures provide insights into how Cas1-Cas2 predetermines the 
length of protospacer by utilizing two Tyr22 residues to measure 
a 23-bp duplex, and the positioning of the catalytic residues de- 
termines the cleavage position, thereby generating 5-nt 3' over- 
hangs on both strands. Thus, the architecture of the Cas1 -Cas2- 
protospacer DNA complex provides the basis for the observed 
length of 33 nt of the DNA cleavage product, thereby explaining 
what factors contribute to the determination of the constant 
length of newly acquired spacer in vivo. 

Source of Protospacer 

Prior to our study, the exact nature of the DNA substrate associ- 
ated with Cas1-Cas2 was unknown. Here, we reveal that, apart 



from a double-stranded duplex region, single-stranded over- 
hangs are critical for DNA-protein complex formation. We 
show that the unique interaction between the 3' overhang and 
the catalytic domain of Cas1a is possible for ssDNA overhangs, 
but not for rigid dsDNA duplexes. In addition, our binding assay 
suggests that a 3' overhang containing a minimum of 7 nt is 
essential for the association between Cas1 -Cas2 and the DNA 
substrate (Figure 4E), possibly because the last 3 nt (positions 
5-7 in the overhang) are, in fact, complementary to the AAG-con- 
taining PAM sequence in the invading DNA, thereby explaining 
why each new spacer starts with a G residue (Yosef et al., 201 3). 

If our model is correct, the question arises as to where such a 
single-stranded protospacer overhang would occur in an in vivo 
situation, i.e., in the invading phage or plasmid DNA. Intriguingly, 
spacer acquisition was shown to be highly replication depen- 
dent. The DNA degradation intermediates of RecBCD complex 
present at stalled replication forks might be the source of new 
spacers, as these intermediates include ssDNA fragments and 
degraded dsDNA (Levy et al., 2015; Paez-Espino et al., 2013). 
This finding fits well with our analysis and addresses the question 
of the origin of the single-stranded protospacer 3' overhang (Fig- 
ure 7A), and our results also address why the protospacer hot- 
spots are located between sites of stalled replication forks and 
Chi sites. Together, our structures strongly suggest that E. coli 
protospacers are recognized and associated with Cas1-Cas2 
in a dual-forked DNA topology, consisting of a 23-bp duplex 
and a minimal 7-nt single-stranded 3' overhang in vivo. There- 
fore, in addition to the PAM that affects the spacer choice, the 
structural feature of the protospacer DNA also influences the fre- 
quency of protospacer incorporation. 

Protospacer Selection 

The interactions observed in our structure between Cas1a and 
5'-CTT-3' (Figures 4B and 4C), together with the EMSA results 
indicating the minimal 7-nt length requirement of the 3' over- 
hangs (Figure 4E), strongly suggest that the PAM-complemen- 
tary sequence (being the last three nucleotides in the 7-nt 3' 
overhang) plays a significant role in ensuring proper complex for- 
mation. In agreement with the important role of the length of 3' 
overhangs, the complex of Cas1-Cas2 co-crystallized with sin- 
gle-forked DNA containing 10-bp duplex and only six T over- 
hangs at both 3' and 5' ends was free of DNA, indicating insuffi- 
cient association between DNA and protein complex. Together, 
these findings support the notion that 3' overhangs of defined 
length and the PAM-complementary sequence are both essen- 
tial for DNA binding to Cas1-Cas2 and thus critical for spacer 
acquisition. In all likelihood, these results explain the observation 
that AAG motif in the PAM sequence enhances adaption of the 
protospacer adjacent to it (Yosef et al., 2013). 

PAM recognition is essential for protospacer selection during 
acquisition and for target selection during crRNA interference 
(Deveau et al., 2008; Mojica et al., 2009). In the acquisition ma- 
chinery of E. coli type I system, Cas1 a recognizes the PAM-com- 
plementary sequence in its single-stranded form. In the type II 
system, during crRNA interference, the target DNA flanked by 
PAM sequence (5'-NGG-3') is recognized by Streptococcus 
pyogenes Cas9 in its dsDNA form (Anders et al., 2014). Interest- 
ingly, in the type II CRISPR-Cas system, Cas9 is not only 
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involved in the interference, but also in the spacer acquisition by 
associating with Cas1 , Cas2, and Csn2 forming the acquisition 
machinery, thus coupling the interference and the acquisition 
machineries (Heler et al., 2015). 

In the spacer acquisition step of both type I and II systems, 
Cas1 and Cas2 are critical, and the cleavage activity of Cas1 is 
essential for acquisition. By contrast, Cas9 binds the PAM in 
the type II system, while Cas1 recognizes the PAM-complemen- 
tary sequence in the type I system. Whether Cas1 is also involved 
in the protospacer selection in the type II system remains under 
debate. 

Mechanism of CRISPR Acquisition 

Given that Cas1-Cas2 is symmetric, both Cas1a and Cas1a' 
are capable of recognizing and binding the PAM-complemen- 
tary sequence (5'-CTT-3') and cleaving the overhangs of the 
protospacer to generate two 3'-OH groups. Following cleav- 



Figure 7. Model of CRISPR Spacer Acquisi- 
tion 

(A) Model explaining the capture of new DNA se- 
quences from invading nucleic acid. Note the 
schematic representations of the “wing-up” and 
“wing-down” conformations of the apo- and pro- 
tospacer-bound Cas1-Cas2 complexes. To 
simplify, both monomers in a Casi dimer are in 
orange. 

(B) Model of DNA integration into the host CRISPR 
array. The Cas1a-mediated cleavage sites located 
on the 3' overhangs, which are positioned 5 resi- 
dues from the terminal base pairs, are represented 
by purple scissors. The cleavage product has 5-nt 
3' overhangs with 3' -OH groups on both strands, 
resulting in a distance between the 3' overhang 
ends of 33 nucleotides. The two 3' ends of the 
incoming protospacer are involved in nucleophilic 
attack on the CRISPR locus, as shown by the 
dashed red and blue arrows, respectively. Lastly, 
the gapped duplex is repaired by the host DNA 
replication machinery. The GC base pair originated 
from the PAM sequence is highlighted by green 
background. The leader is in gray, repeat 1 in 
green, and spacer 1 in cyan. 



age, Cas1-Cas2 catalyzes the integra- 
tion of the incoming DNA at the leader 
end of the CRISPR locus by two nucle- 
ophilic attacks at two sites on opposing 
strands (Nunez et al., 2015; Rollie et al., 
2015). The leader-Repeatl segment is 
asymmetric, and the two sites on the 
target DNA have different sequences, 
with the choice of 3'-OH selection 
based on the terminal residue being a 
C. As shown in Figure 7B, site 2 (5'- 
CGG-3') may preferentially select the 
3'-OFI of C. Thus, the leader sequence 
and the sequences surrounding the pro- 
tospacer integration sites may play a 
critical role in correctly orienting the 3'- 
OH of C end of the protospacer DNA substrates for incorpo- 
ration within the CRISPR locus. A recent study found that an 
artificial leader-Cas combination results in the insertion of 
the complex in the wrong orientation (Diez-Villasehor et al., 
2013). This observation is consistent with our model shown 
in Figure 7B. Therefore, we speculate that the sequence of 
leader-repeat 1 within the CRISPR locus may affect the bind- 
ing orientation of the Cas1 -Cas2-protospacer complex on the 
CRISPR locus. 

CRISPR Adaption Likely Works through a Cut-and-Paste 
Mechanism 

As Cas1 and Cas2 proteins are essential in both naive and 
primed adaptation, we propose that our structures of the 
complexes are likely to be suitable for both types of immunity. 
During primed adaptation, the partial ssDNA, resulting from 
the Cas3 degradation product or from an R loop formed 
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upon crRNA binding target DNA, might be used as a precur- 
sor for new spacers by Cas1-Cas2. To date, the general 
assumption is that spacer acquisition works through a copy- 
and-paste mechanism, as opposed to a cut-and-paste pro- 
cess. Our structures reveal that Cas1 selects and cuts the 
foreign DNA to generate a spacer, which is in agreement 
with previous studies stating that Cas1-Cas2 mediates the 
cleavage-ligation reaction (Arslan et al., 2014), indicating that 
the CRISPR adaption likely works via a cut-and-paste 
mechanism. 

The acquisition of new spacer sequences is absolutely essen- 
tial for acquiring immunological memory and is crucial for main- 
taining an advantage over invading DNA elements by continu- 
ously updating the DNA library for crRNA interference of 
invading DNA elements. Our study shows that E. coli possesses 
a sophisticated machinery that utilizes frequently occurring PAM 
sequences as essential identification markers, which allow for 
efficient cleavage of the DNA sequence once embedded into 
Cas1-Cas2. Therefore, Cas1-Cas2 acts as a sequence-specific 
integrase. Of equal importance, this protein complex was de- 
signed by nature in such a manner that the protospacer binding 
results in a major conformational change in the protein, in the 
process of which an arch-like structure is created that is likely 
to be involved in proper binding to the first repeat of the CRISPR 
locus. These findings should lay the foundation and greatly facil- 
itate the quest for identifying additional insights into the struc- 
tural mechanisms responsible for the integration of new spacers 
into the CRISPR locus. 

EXPERIMENTAL PROCEDURES 

Detailed experimental procedures are described in the Supplemental Experi- 
mental Procedures. 

E. coli Casi and Cas2 were cloned into pET-sumo expression vector and ex- 
pressed in E. coli Rosetta2 (DE3) (Novagen). Casi was purified by chromatog- 
raphy on nickel and Heparin HP column (GE Healthcare). Cas2 was purified by 
chromatography on nickel, Q FF column, and Superdex 200 (GE Healthcare). 
Casi and Cas2 proteins were concentrated to 35 mg/ml and 5 mg/ml, respec- 
tively. The Casi and Cas2 mutants were made with Quick-Change kit and 
verified by sequencing. All mutant proteins were expressed with the same pro- 
tocol as that used for the wild-type protein. 

The Cas1-Cas2 single-forked DNA complex was reconstituted by incu- 
bating Casi, Cas2, and single-forked DNA at the molar ratio of 1:1. 1:0. 6 on 
ice for 30 min and was further purified by gel filtration. The Cas1-Cas2 dual- 
forked DNA complex was reconstituted on ice for 30 min by incubating 
Casi , Cas2, and DNA at the molar ratio of 1 :1 .1 :0.3. 

The Casi -Cas2-DNA complexes were crystallized at 16°C by the hanging- 
drop vapor diffusion method. All Cas1-Cas2-DNA complex crystals were ob- 
tained by mixing equal volumes of complex solution and reservoir solution. 
X-ray diffraction data were collected at 100 K on the beamlines BL-17U and 
BL-19U at Shanghai Synchrotron Radiation Facility. All structures were solved 
by molecular replacement using the Casi monomer and Cas2 monomer in the 
DNA-free Cas1-Cas2 structure as the search models. All structures were 
refined using the program Refmac and Phenix and were manually built with 
COOT. All structural figures were prepared with Pymol (http://pymol.org). 

Binding affinities of various DNA molecules to Casi -Cas2 were tested using 
an EMSA. Functional importance of DNA-interacting residues was validated by 
EMSA and by using an in vivo spacer acquisition assay, as described previ- 
ously (Yosef et al., 2012). Furthermore, the cleavage assays were undertaken 
using 5' Cy3-labeled DNA with 23-bp duplex flanked by 10-nt 3' overhangs. 
The sequences of all DNA oligonucleotides used in the study are listed in 
Table S2. 



ACCESSION NUMBERS 

The atomic coordinates of the Cas1 -Cas2-DNA complexes have been depos- 
ited in the Protein Data Bank with accession numbers listed in parenthesis. 
Cas1-Cas2 single-forked DNA (PDB: 5DQU), Cas1-Cas2 dual-forked DNA 
with 23-bp duplex (PDB: 5DLJ), Cas1-Cas2 dual-forked DNA with 22-bp 
duplex (PDB: 5DQT), and Cas1-Cas2 bound to the PAM -complementary 
sequence (PDB: 5DQZ). 
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SUMMARY 

CRISPR-Cas adaptive immune systems protect bac- 
teria and archaea against foreign genetic elements. 
In Escherichia coli, Cascade (CRISPR-associated 
complex for antiviral defense) is an RNA-guided 
surveillance complex that binds foreign DNA and 
recruits Cas3, a frans-acting nuclease helicase for 
target degradation. Here, we use single-molecule 
imaging to visualize Cascade and Cas3 binding to 
foreign DNA targets. Our analysis reveals two distinct 
pathways dictated by the presence or absence of a 
protospacer-adjacent motif (PAM). Binding to a pro- 
tospacer flanked by a PAM recruits a nuclease-active 
Cas3 for degradation of short single-stranded re- 
gions of target DNA, whereas PAM mutations elicit 
an alternative pathway that recruits a nuclease-inac- 
tive Cas3 through a mechanism that is dependent on 
the Cas1 and Cas2 proteins. These findings explain 
how target recognition by Cascade can elicit distinct 
outcomes and support a model for acquisition of new 
spacer sequences through a mechanism involving 
processive, ATP-dependent Cas3 translocation 
along foreign DNA. 



INTRODUCTION 

Many prokaryotes harbor an RNA-guided adaptive immune 
system comprised of a genetic locus called CRISPR (clustered 
regularly interspaced short palindromic repeats) and the 
CRISPR-associated (Gas) genes (Barrangou and Marraffini, 
2014; van der Cost et al., 2014; Westra et al., 2012a). The 
CRISPR locus was first identified in Escherichia coii as an un- 
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usual series of 29-bp repeats separated by 32-bp spacer se- 
quences (Ishino et al., 1987). It was later recognized that these 
spacers were derived from foreign genetic elements, suggesting 
the CRISPR locus might serve as an RNA-guided immune sys- 
tem (Bolotin et al., 2005; Makarova et al., 2006; Mojica et al., 
2005). It is now known that CRISPR-Cas immunity is conferred 
through integration of short DNA fragments into the CRISPR 
locus, and these spacer sequences record the history of past 
infections (Barrangou and Marraffini, 2014; van der Cost et al., 
2014; Westra et al., 2012a). The CRISPR locus is transcribed, 
and the resultant transcript is processed into shorter CRISPR- 
RNAs (crRNAs), each containing a sequence complementary 
to a previously encountered foreign DNA element. 

CRISPR-Cas systems are classified as types I, II or III, which 
can be distinguished based on the presence of the signature 
Cas3, Cas9, or CaslO genes, respectively (Barrangou and Mar- 
raffini, 201 4; Westra et al., 201 2a). Type I are the most common, 
and much of our understanding of type I CRISPR-Cas systems 
comes from studies of E. coii Cascade (CRISPR-associated 
complex for antiviral defense), which is comprised of the five 
proteins Csel , Cse2, Cas7, Cas5e, and Cas6e. These proteins 
assemble on a 61 -nt crRNA, yielding a 405- kDa complex. The 
crRNA contains the 32-nt spacer sequence, which directs 
Cascade to sequences (protospacers) in foreign DNA, leading 
to formation of an R-loop intermediate. Cascade then recruits 
Cas3, which has an N-terminal histidine-aspartate (HD) nuclease 
domain and C-terminal superfamily 2 (SF2) helicase domain, to 
degrade the DNA (Mulepati and Bailey, 2013; Sinkunas et al., 
2013). 

Cascade must discriminate between spacer sequences 
found in the bacterial chromosome and those found in foreign 
DNA. This discrimination is thought to be accomplished 
through recognition of a trinucleotide sequence motif called 
the protospacer-adjacent motif (PAM; 5'-A[A/T]G-3' for E. coii 
Cascade), which is adjacent to the protospacer in foreign 
DNA, but absent in the CRISPR locus. Strict sequence 
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Figure 1. Programmed Target Binding by 
E. coli Cascade 

(A) Overview of DNA curtains. 

(B) Schematic of E. coli Cascade programmed 
with a crRNA targeting one of three different 
binding sites (designated >.1 , \2, and \3) on 

(C) Wide-fieid TiRF microscopy image showing 
QD-tagged Cascade (magenta) bound to DNA 
(green) at 'k^ . 

(D) Wide-fieid image showing Cascade bound 
at 13. 

(E) Binding distribution for Cascade targeted to 
each of the three protospacers; error bars here 
and aii subsequent binding distributions represent 
95% confidence intervais obtained through boot- 
strap anaiysis. 
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requirements present a potential weakness because mutations 
in either the PAM or protospacer can allow foreign DNA to 
escape CRISPR-Cas immunity (Semenova et al., 2011). How- 
ever, bacteria can rapidly restore immunity using a positive- 
feedback loop to update the CRISPR locus (Datsenko et al., 
2012; Fineran et al., 2014). The mechanism of primed spacer 
acquisition (priming) remains perhaps one of the most poorly 
understood aspects of CRISPR-Cas immunity (Datsenko 
et al., 2012; Fineran et al., 2014; Heler et al., 2014). Priming re- 
quires Cascade with a crRNA bearing at least partial comple- 
mentarity to the escape target, suggesting Cascade must be 
able to locate targets even when they bear mutations sufficient 
to escape immunity (Datsenko et al., 2012). Priming also 
requires Cas3 (Datsenko et al., 2012) and the Cas1-Cas2 com- 
plex (Nunez et al., 2014), which integrate new sequences into 
the CRISPR locus (Nunez et al., 2015). It is not known how 
these complexes elicit the priming response to foreign ele- 
ments bearing escape mutations. 

Here, we use single-molecule imaging to visualize individual 
Cascade complexes as they search for protospacers within 
the bacteriophage X genome. Our work reveals PAM-depen- 
dent and PAM-independent search pathways. The PAM- 
dependent pathway is highly efficient and allows Cascade to 
recruit Cas3 for strand-specific degradation of the target 
genome. The PAM-independent pathway is less efficient, but 



Cascade can still bind tightly to the 
DNA, ensuring that it can initiate the 
sequence of molecular events that pre- 
cede primed spacer acquisition. Through 
this pathway, Cas3 recruitment be- 
comes strictly dependent on Casl- 
Cas2, and Cas1-Cas2 also attenuate 
Cas3 nuclease activity and enable Cas3 
to rapidly translocate in either direction 
along the foreign DNA. These results 
establish Cas1-Cas2 as a frans-acting 
factor necessary for the recruitment 
and regulation of Cas3 at escape tar- 
gets. Based on our findings, we propose 
a mechanistic framework describing how 
Cascade, Casi , Cas2, and Cas3 work together to process and 
disable foreign genetic elements. 

RESULTS 

DNA Curtain Assay for Target Binding by Cascade 

We sought to establish a DNA curtain assay using total internal 
reflection fluorescence (TIRF) microscopy for visualizing the 
behavior of Cascade on individual molecules of wild-type phage 
X DNA (X'^^) (Figure 1A; Supplemental Experimental Procedures) 
(Greene et al., 201 0). In brief, the surface of a microfluidic sample 
chamber was coated with a lipid bilayer, and DNA molecules 
were anchored to the bilayer through a biotin-streptavidin inter- 
action. The DNA was then pushed to the leading edges of nano- 
fabricated barriers to lipid diffusion, and the downstream ends 
were anchored to pedestals through antibody-hapten linkage 
(Gorman et al., 2010, 2012). Cascade was prepared with one 
of three crRNAs targeted to different regions of and then 
labeled with antiFLAG-quantum dots (QDs) attached to the 
3xFI_AG-tagged Cas6e subunit (Figure IB). When visualized on 
DNA curtains. Cascade bound to target sites corresponding to 
DNA sequences complementary to the three different crRNAs 
(Figures 1C-1E). Cascade remained bound for at least 57 min; 
this lifetime represents a lower limit for the Cascade-protospacer 
interaction because these measurements are limited by the 
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Figure 2. Cascade Searches for PAMs while 
Interrogating Foreign DNA 

(A) Kymographs highlighting examples of Cascade 
binding events over two different time regimes (see 
scale bars). Examples of transient sampling and 
stable recognition are highlighted. 

(B-D) Distribution of PAMs (blue line) and transient 
binding events for Cascade programmed with (B) 
the X1-crRNA, (C) the >.3-crRNA, or (D) a P7- 
crRNA. Count refers to number of occurrences 
within 1 kbp of DNA. The locations of the X1 and 13 
target sites are indicated, and the heat map color- 
coding reflects the binding dwell time (f,) relative to 
the mean dwell time (f). 

(E-G) Correlation of PAMs with the transient 
binding events for Cascade programmed with (E) 
the -crRNA, (F) the 73-crRNA, or (G) P7-crRNA, 
as indicated. Outlying data points (colored green 
and boxed) reflect underrepresented binding 
events at PAM sites near the ends of the DNA; 
detection of binding at these sites is hindered by 
the chromium barriers. 

See also Figure S1 . 
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stability of the SxFLAG-antiFLAG interaction (Sternberg et al., 
2014). Stable binding was not observed for Cascade bearing a 
control crRNA (P7-crRNA) that was not complementary to 1^. 

PAM-Dependent Target Recognition 

Next, we sought to determine how Cascade locates proto- 
spacers by visualizing reactions in real-time. Most Cascade 
(> 75%) appeared immediately at the protospacer without exhib- 
iting any evidence of microscopically detectable motion along 
the DNA (Figure 2A). This finding leads us to conclude that 
Cascade located the protospacer through a pathway that was 
dominated by 3D diffusion at the microscopic scale. Based on 
our optical resolution limits, these experiments provide an upper 
limit on any potential ID diffusion by Cascade of no more than 



~250 bp, although we do not exclude 
that possibility that Cascade may diffuse 
shorter distances along the DNA (Gorman 
et al., 2010, 2012). The remaining fraction 
of Cascade molecules (<25%) underwent 
optically detectable ID diffusion; we did 
not pursue a detailed analysis of this 
behavior because it coincided with a 
loss of binding specificity and appeared 
to arise from Cascade aggregates (data 
not shown). 

Analysis of the 3D events revealed 
long-lived binding to the protospacers, 
as well as transient binding events all 
along the DNA (Figure 2A). The 
genome contains a total of 3,151 PAM 
sites (5'-A[A/T]G-3'), corresponding to 
~1 PAM per 1 5.4 bp, which are asymmet- 
rically distributed across the phage 
genome. Cascade did not randomly sam- 
ple the DNA, instead the transient binding 
events were correlated with the PAM distribution (Figures 2E- 
2G), as we have reported for Streptococcus pyogenes Cas9 
(Sternberg et al., 2014). Control reactions using Cascade pro- 
grammed with P7-crRNA revealed a similar pattern of transient 
binding (Figures 2B-2G), and we could detect no binding activity 
for Cascade lacking Csel (data not shown), which is the subunit 
responsible for PAM recognition (Sashital et al., 2012). Cascade 
programmed with either XI -crRNA or X3-crRNA displayed many 
reversible binding events at their targets, which are revealed by 
the ~50% increased prevalence of longer-lived intermediates at 
both of these target sites relative to non-target sites (Figures 2B, 
2C, and SI A), and also by the peak in binding at XI for the 
XI -crRNA, which is observable due to the overall lower density 
of PAM sites in this region of DNA (Figures 2B and 2C). This 
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Figure 3. Recognition of Escape PAM 
Mutants 

(A) Schematic of bearing two identical pro- 
tospacers, one with a cognate PAM (>.3) and the 
other with an escape PAM (mut>.3). 

(B) Kymograph highlighting example of Cascade 
binding to the mut>.3 through 3D diffusion. 

(C) Wide-field images showing binding to each of 
the two targets at different Cascade concentra- 
tions following a 10-min incubation. Arrowheads 
indicate the locations of the >.3 (green) and mut>.3 
(magenta) targets. 

(D) Binding distributions showing relative occu- 
pancy at each Cascade concentration. 

(E) Quantification of percent occupancy; 0 in- 
dicates no detectable binding. 

(F) Survival probability plots for Cascade bound 
to the two targets; error bars here and all subse- 
quent survival probability plots represent 70% 
confidence intervals obtained through bootstrap 
analysis. 

See also Table S1. 



category of long-lived, but reversible, binding events at the pro- 
tospacers likely represents abortive engagement, suggesting 
Cascade must often make multiple attempts before stably 
engaging the protospacer, similar to what we have observed 
for Cas9 (Sternberg et al., 2014). 

The transient binding events exhibited double-exponential de- 
cays similar to S. pyogenes Cas9 (Sternberg et al., 2014), with 
lifetimes of ~3 and ~25 s (Figures S1A-S1D), indicating that at 
least two intermediates exist on the pathway toward target 
recognition. The lifetimes of these intermediates were not appre- 
ciably affected by either salt concentration or temperature (Fig- 
ure SID), similar to findings for Cas9 (Sternberg et al., 2014). 
These characteristics, more commonly attributed to site-specific 



association, provide further evidence that 
the initial observed interactions are based 
on a sequence-dependent association 
with PAM sites rather than on nonspe- 
cific interactions with the DNA phosphate 
backbone. 

PAM-Independent Target 
Recognition 

Next, we sought to determine whether 
and how Cascade locates targets that 
lack a canonical PAM. For this, we gener- 
ated a new phage construct 
bearing two duplicate targets (Figure 3A). 
One of the protospacers (X3) was 
adjacent to a cognate PAM [5'-ATG-3'j, 
whereas second protospacer (mutA3) 
was adjacent to a mutated PAM [5-ATT- 
3']. This escape PAM (ePAM) was chosen 
because it enables an invading DNA to 
escape the CRISPR-Cas machinery, but 
still elicits a rapid priming response (Dat- 
senko et al., 2012; Fineran et al., 2014). 
Surprisingly, Cascade could still bind both protospacers, and 
binding of mutA3 still occurred through 3D diffusion (Figure 3B), 
but recognition of mutX3 was much less efficient than recogni- 
tion of X3 (Figures 3C-3E). This difference was evidenced by 
the ~1 0-fold higher Cascade concentration necessary to 
achieve similar levels of occupancy at both protospacers. 
Despite the large difference in initial recognition, the lifetimes 
of Cascade at 73 and mut73 were comparable (57 and 40 min, 
respectively; Figure 3F). We conclude that PAMs increase the 
efficiency of target recognition, but that Cascade is still capable 
of protospacer recognition and high-affinity binding in the 
absence of a cognate PAM, and this conclusion is consistent 
with previous studies (Szczelkun et al., 2014). 
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Figure 4. Cas3 Generates an ssDNA Gap at 
the X3 Protospacer 

(A) Image showing RPA-eGFP foci at >.3 for re- 
actions with unlabeled Cascade and unlabeled 
Cas3. 

(B) Control images showing that RPA-eGFP foci 
are not present when Cas3 is omitted from the 
reactions; the upper and lower panels show the 
same field of view. 

(C) Requirements for RPA-eGFP foci formation 
at \ 2 >. 

(D) Distribution of RPA-eGFP foci in reactions 
containing both Cascade and Cas3; Count refers 
to the number of occurrences within 1 kbp of DNA. 

(E) Signal intensities for RPA-eGFP foci. The in- 
tensity of a focus comprised of three molecules of 
RPA-eGFP is indicated, and each successive bin 
corresponds to -^1 additional molecule of RPA- 
eGFP. The heat map color-coding in (D) and (E) are 
the same. 

(F) Representative stepwise photobleaching curve 
used to estimate the number of RPA-eGFP mole- 
cules in each focus. 

See also Figure S2. 
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Cas3 Recruitment Leads to Disruption of the Target DNA 
Duplex 

Next, we sought to visualize Cascade-dependent recruitment 
of Cas3. Cas3 interacts with Csel , and the displaced single- 
stranded DNA (ssDNA) strand that is generated by R-loop forma- 
tion (Hochstrasser et al., 201 4; Mulepati and Bailey, 201 3; Sinku- 
nas et al., 201 3). Upon recruitment, Cas3 first nicks the DNA and 
is thought to then translocate in the 3' ^5' direction along the 
non-target strand, while unwinding and degrading duplex DNA 



through an ATP-, Mg^"^-, and Co^"^- 
dependent mechanism (Mulepati and 
Bailey, 2011, 2013; Sinkunas et al., 
2013). Cas3 degrades both target DNA 
strands in bulk biochemical assays (Mule- 
pati and Bailey, 2013; Sinkunas et al., 
2013). However, these measurements 
use relatively high concentrations of 
Cas3 (50 nM-1 |iM) (Hochstrasser et al., 
2014; Mulepati and Bailey, 2011, 2013; 
Sinkunas et al., 2011, 2013), suggesting 
that DNA degradation may be due to the 
action of multiple Cas3 molecules, only 
the first of which is directly recruited by 
Cascade. Given these considerations, it 
is plausible that the initial Cascade-re- 
cruited molecule of Cas3 only introduces 
a small nick or ssDNA gap in the target 
DNA (Mulepati and Bailey, 2013). 

We reasoned that if Cas3 was initially 
generating ssDNA after loading at 
Cascade, then this might be revealed 
in reactions with low concentrations of 
Cas3 (4 nM), followed by the addition 
of eGFP-tagged replication protein A 
(RPA), which binds ssDNA. When RPA-eGFP was added after 
Cascade and Cas3, bright eGFP foci were detected at the X3 
protospacer (Figure 4A). Formation of RPA-eGFP foci was 
dependent on Cascade, Cas3, ATP, and Co^'" and the conditions 
under which we detected RPA-eGFP foci paralleled the condi- 
tions necessary for plasmid degradation in bulk biochemical as- 
says (Figures 4B and S2C). Furthermore, RPA-eGFP foci were 
not observed for a Cas3 nuclease mutant (D75A) (Figures 4C 
and S2A-S2C). Notably, the DNA in the single-molecule assays 
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Figure 5. Cascade-Mediated Recruitment 
of Cas3 

(A) Image showing that QD-tagged Cas3 is re- 
cruited to unlabeled Cascade at >.3. 

(B) Binding of Cas3 to \ 3 . The distribution is 
segregated into the translocation (orange) and 
stationary (green) Cas3 populations. 

(C) Survival probabilities of the stationary Cas3 
population. 

(D) Kymograph illustrating the translocation of 
Cas3 away from \3 in a reaction with unlabeled 
Cascade. The delay period prior to the initiation of 
Cas3 translocation is indicated. 

(E) Two-color experiment showing that Cas3 
(green) translocates away from Cascade (magenta). 

(F) Survival probability (delay time) of the trans- 
locating population of Cas3 prior to moving away 
from >.3. 

(G) Cas3 velocity distribution. 

(H) Cas3 processivity distribution. 

(I) Kymograph showing an example of Cas3 
repeatedly looping the DNA. 

(J) Intensity profile showing the increase in Cas3 
fluorescence signal coinciding with DNA loop 
formation. 

See also Figure S3 and Movie S1 . 
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was not liberated from the flow cell surface and there was no ev- 
idence for long tracts of RPA-eGFP, indicating that Cas3 only 
generated a small ssDNA gap. To estimate the size of the ssDNA 
gaps, we measured the intensity of the RPA-eGFP foci (Figures 
4D and 4E) and then used photobleaching steps to roughly esti- 
mate the number of RPA-eGFP molecules present (Figure 4F; 
Supplemental Experimental Procedures). We estimate that the 
average focus contained ~8-1 0 molecules of RPA-eGFP, corre- 
sponding to ~240-300 nt of ssDNA. These results suggest that 
the first Cas3 molecule recruited by Cascade makes a short 
ssDNA gap adjacent to the protospacer. 



Cas3 Recruitment to Target-Bound 
Cascade 

Next, we sought to visualize the behavior 
of fluorescently tagged Cas3 (Figure S3A; 
Supplemental Experimental Procedures). 
We were unable to detect stable binding 
of Cas3 to Cascade when ATP or Mg^"^ 
were omitted or when ATP was replaced 
with ADP or AMP-PNP (data not shown). 
Flowever, Cas3 bound stably to Cascade 
when ATP and Mg^"^ were included in the 
reactions (Figures 5A and 5B). Cas3 
located Cascade through 3D diffusion dur- 
ing initial recruitment (see Figure 5D). Once 
bound, ~55% of the Cas3 molecules re- 
mained stationary within optical resolution 
limits (Figure 5B). These seemingly sta- 
tionary molecules exhibited two distinct 
lifetimes: one population with a lifetime 
(ti ) of ~6 s and a second population (t 2 ) 
with a lifetime of >1 min (Figures 5C, 
S3B, and S3C). These findings suggest that Cas3 transiently sam- 
ples target-bound Cascade before transitioning into a more stably 
bound state and that entry into this longer-lived state requires ATP 
hydrolysis. Interestingly, once a longer-lived Cas3 binding event 
was observed at a given molecule of Cascade, then that particular 
Cascade complex appeared incapable of recruiting any additional 
Cas3 at the protein concentrations used in these assays. 

Cas3 Is a Highly Processive Molecular Motor 

Many of the Cas3 molecules (~45%) translocated along the DNA 
(Figures 5B, 5D, and 5E). In these instances, Cas3 was recruited 
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to Cascade at the X3 protospacer and then moved rapidly away 
from the protospacer in a direction consistent with 3' ^5' trans- 
location on the non-target strand, as expected from bulk 
biochemical experiments (Mulepati and Bailey, 2013). There 
was no evidence that Cas3 translocation could initiate from any 
other location on the DNA other than the 13 protospacer, and 
Cas3 translocation was entirely dependent on the presence of 
Cascade. Remarkably, Cascade remained tightly bound to the 
protospacer even after Cas3 had begun translocating along the 
DNA (Figure 5E). Moreover, once Cas3 had translocated away 
from Cascade, then no additional molecules of Cas3 could bind 
to or translocate away from that particular Cascade complex. 

Cas3 exhibited a short delay prior to moving away from 
Cascade (Figure 5D); analysis of these delay times revealed 
two lifetimes that were similar to the Ti andr 2 lifetimes for the sta- 
tionary Cas3 population, suggesting that the observed intermedi- 
ates reflected the same underlying molecular processes (Figures 
5C, 5F, S3B, and S3C). Cas3 traveled at a mean velocity of 
~316 bp/s for 12,000 bp before stalling or dissociating from the 
DNA (Figures 5G and 5H), and >99% of molecules exhibited uni- 
directional movement (Figures 5D and 5E; Movie SI ; see below). 
Three key observations suggested that Cas3 was not extensively 
degrading the DNA during translocation. First, there was no evi- 
dence that the translocating population of Cas3 caused dou- 
ble-strand breaks. Second, we saw no evidence for long ssDNA 
tracts when reactions were chased with RPA-eGFP. Finally, if 
Cas3 had generated tracts of ssDNA long enough to be optically 
detected, then Cascade would also appear to move in the same 
direction because of the change in persistence length that ac- 
companies the conversion of dsDNA to ssDNA, but Cascade al- 
ways remained stationary at the protospacer. We conclude that 
Cas3 is a highly processive molecular motor that first generates 
a small ssDNA gap and then translocates in 3' ^ 5' direction along 
the non-target DNA strand away from Cascade. 

Evidence for Looped DNA Intermediates 

Surprisingly, in addition to our observation that Cas3 recruitment 
and translocation did not coincide with the ejection of Cascade 
from the DNA, inspection of the Cas3 translocation trajectories 
revealed evidence that the contacts between Cas3 and Cascade 
were not immediately broken. In many instances (14%), Cas3 
began to translocate along the DNA, but then returned almost 
instantaneously to the original binding site (Figure 51). This 
behavior coincided with an increase in Cas3 fluorescence, sug- 
gesting that the molecules were pulled closer to the surface of 
the flow cell because of increased tension on the DNA. These ob- 
servations are most consistent with looped DNA intermediates, 
where Cas3 maintains contact with Cascade, while simulta- 
neously translocating for a short distance along the flanking 
duplex DNA (Figure S3D). We conclude that Cas3 can initially 
remain bound to Cascade as it begins translocating along the 
DNA and that a subset of these molecules generates optically 
detectable DNA loops. 

PAM Is Essential for Cascade-Mediated Recruitment 
of Cas3 

Next, we sought to determine whether Cascade could recruit 
Cas3 to mutX3, and if so, whether the properties of Cas3 differ in 



the absence of a cognate PAM. Interestingly, Cas3 did not co- 
localize with Cascade at muX13 (Figure S4A), and we were unable 
to detect even transient binding of Cas3 to Cascade at the muXlS 
protospacer. We were also unable to detect RPA-eGFP foci at 
mutX3 (Figures S4B-S4E), and Cas3 did not cleave plasmid sub- 
strates bearing the mutX3 protospacer (see below). We conclude 
that Cascade cannot recruit Cas3 to DNA in the absence of a 
cognate PAM, in agreement with previous bulk biochemical ex- 
periments (Flochstrasser et al., 2014; Mulepati and Bailey, 2013). 

PAM-Independent Recruitment of Cas3 by Cas1-Cas2 

Cas3 is required for primed sequence acquisition (Datsenko 
et al., 2012), suggesting that alternative pathways must exist to 
recruit Cas3 to escape targets. Casi and Cas2 are universally 
conserved across CRISPR types and are also necessary for 
primed sequence acquisition, suggesting the possibility that 
these proteins may work in concert with Cascade to promote 
the recruitment of Cas3 to escape targets. Therefore, we next 
asked whether the Cas1-Cas2 complex might affect target 
recognition, target processing, or both, in reactions with Cas3. 
Attempts to generate fluorescently tagged Casi or Cas2 yielded 
inactive proteins, and therefore these experiments utilized wild- 
type (unlabeled) Cas1-Cas2. 

Remarkably, the addition of Cas1-Cas2 enabled the recruit- 
ment of Cas3 to mutX3 and also ~3-fold enhanced recruitment 
of Cas3 to 13 (Figures 6A and 6B; Movie SI and S2). The velocity 
and processivity of Cas3 were not altered by Casi -Cas2 (Figures 
S5A and S5B). However, Cas3 recruited to the escape target 
behaved markedly different from Cas3 that was recruited to 
cognate protospacer. Most strikingly, Cas3 targeted to mut);3 
could rapidly translocate in either direction away from Cascade 
(Figure 6C; Movie S3). Moreover, Cas3 exhibited only a ~6 s delay 
prior to moving away from mut);3, but there was no evidence 
for the second longer-lived intermediate (t 2 ) that was always 
observed at 13 (Figures S3B, S3C, and S5C). There was also 
no evidence for ssDNA gaps at mutX3 in the presence of Casi - 
Cas2 (Figure S5D), and bulk biochemical assays with Cascade, 
Cas1-Cas2, and Cas3 revealed no nicking or cleavage of plas- 
mids with the mut7;3 protospacer (Figures S6A and S6B), even 
though Cascade was capable of binding the mutX3 protospacer 
in bulk assays (Figure S6C). Finally, there was no evidence for 
Cas3-mediated DNA looping at mutX3 in reactions with Casl- 
Cas2 (Figure 6C). Together, these results show that Cas1-Cas2 
are necessary to recruit Cas3 to mut);3 and attenuate the nuclease 
activity of Cas3 at these escape targets, enabling Cas3 to translo- 
cate away from Cascade in either direction along the foreign DNA. 

Casi -Cas2 also appeared to affect the behavior of Cas3 at the 
13 protospacer. Specifically, Cas1-Cas2 partially attenuated 
Cas3 nuclease activity in bulk biochemical assays (Figures S6A 
and S6B), and the presence of Cas1-Cas2 also enabled iterative 
Cas3 binding and translocation events from the same Cascade 
complex bound to the 13 protospacer (Movie S4). This observa- 
tion was in stark contrast to reactions done in the absence of 
Cas1-Cas2, where we never detected evidence of multiple 
Cas3 recruitment events to the same Cascade complex. These 
findings suggest that Cas1-Cas2 not only enhances the recruit- 
ment of Cas3 to Cascade bound at 13 but may also enable iter- 
ative Cas3 loading events. 
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Figure 6. Cas1-Cas2-Mediated Recruitment of Cas3 to Escape 
Targets 

(A) Binding distribution of Cas3 on in the absence of Cas1-Cas2. 

(B) Cas3 binding distribution histogram on in the presence of Casl- 
Cas2. 

(C) Overlaid trajectories showing examples of Cas3 translocation events 
originating from either the >.3 protospacer (green) or the mut>.3 protospacer 
(magenta). Of the trajectories originating from mutX3, 59% of the Cas3 
molecules move toward the downstream anchor points, and the remaining 
41 % travel in the opposite direction. 

See also Figures S4 and S5 and Movies S2, S3, and S4. 

DISCUSSION 

CRISPR-Cas immunity involves complex interplay among 
multiple macromolecular components, with the potential for 
overlapping or convergent pathways. Our work reveals two 
distinct pathways for target recognition and processing and 
shows that the choice of pathway is dictated by the presence 
or absence of a PAM sequence adjacent to the targeted proto- 
spacer (Figure 7). 

A Conserved Mechanism for PAM-Dependent Target 
Recognition 

Our results support a model in which an initial search for PAM se- 
quences is the predominant mode of DNA surveillance by E. coli 
Cascade (Figure 7A). Once a PAM is identified, Cascade interro- 



gates the flanking DNA for sequence complementarity to the 
crRNA via directional unwinding of the DNA beginning at the 
PAM, and identification of a matching protospacer leads to 
stable capture and R-loop formation (Rutkauskas et al., 2015). 
This PAM-dependent search process is strikingly similar to that 
of S. pyogenes Cas9, the crRNA-guided surveillance complex 
in type II CRISPR-Cas systems, which also initiates the search 
by looking for PAMs (Sternberg et al., 2014). In addition, the 
type IF CRISPR-Cas system of Pseudomonas aeruginosa also 
searches for PAM sequences before probing the flanking DNA 
for sequence complementarity to the crRNA (Rollins et al., 
2015). The type II CRISPR-Cas systems require only a single 
polypeptide for target recognition and cleavage, whereas type 
I CRISPR-Cas systems require large multimeric complexes for 
target recognition and a separate trans-acWng protein (Cas3) 
for DNA cleavage. Cas9 and Csel share no amino acid 
sequence homology, and the Cas9 PAM (5'-NGG-3') and the 
Cascade PAM (5'-A[A/T]G-3') are located on opposite ends of 
the protospacer and on different DNA strands (Jinek et al., 
2012; Sashital et al., 2012). Given these differences, there was 
no reason to assume that S. pyogenes Cas9 and E. coli Cascade 
would search for target sites using the same general mechanism. 
The similarities between Cascade and Cas9 suggest that an 
initial search for PAMs may be a broadly conserved mechanism 
for DNA surveillance among the type I and type II CRISPR-Cas 
systems. 

Facilitated Diffusion versus Reduced Complexity 

It is often assumed that site-specific DNA binding proteins 
accelerate target searches relative to 3D diffusion by facilitated 
diffusion, which reduces the dimensionality of the search pro- 
cess through 1 D sliding, hopping, and/or intersegmental transfer 
(von Hippel and Berg, 1989). However, there is little evidence 
supporting this general assumption (Halford, 2009). The 
Cascade target search is remarkably similar to that of Cas9’s, 
which also exhibits no evidence of ID sliding (Sternberg et al., 
2014). Instead, we find that Cascade and Cas9 both appear to 
optimize their target searches by reducing the complexity of 
the sequence space that is sampled while surveying DNA. 
They accomplish this task by first looking for a small portion of 
the overall binding site, the PAM, before probing the flanking 
DNA for sequences complementary to the crRNA, which pro- 
vides an additional layer of discrimination enabling Cascade to 
sample and reject incorrect targets (Rutkauskas et al., 2015; 
Sternberg et al., 2014). The effectiveness of this strategy can 
be illustrated by considering that based on sequence composi- 
tion alone Cascade can avoid ~90% of the X genome just by uti- 
lizing the PAM as an initial recognition signal, while kinetically 
ignoring other sequences. The finding that much higher Cascade 
concentrations are necessary to achieve similar occupancy at 
protospacers with an escape PAM compared to those with a 
cognate PAM also reflects the effectiveness of reducing search 
complexity. 

PAM-Dependent Target Processing 

The PAM-dependent pathway requires only Cascade to recruit 
Cas3 to protospacers (Figure 7B). Cas3 first transiently samples 
Cascade before transitioning into a stably bound complex. 



Cell 163, 854-865, November 5, 2015 ©2015 Elsevier Inc. 861 




Cell 






Formation of this longer-lived species prevents any further 
Cascade-specific recruitment of Cas3, most likely because the 
first stably bound Cas3 cleaves the R-loop, which destroys the 
Cas3 binding site (Mulepati and Bailey, 2013). Consistent with 
this interpretation, formation of stable Cascade-Cas3 intermedi- 
ates coincides with the appearance of a ~200- to 300-nt ssDNA 
gap adjacent to the protospacer. The first molecule of Cas3 does 
not appear to induce any damage other than creating this initial 
ssDNA gap. This finding is notably different from bulk biochem- 
ical assays, which reveal more extensive DNA degradation (Mu- 
lepati and Bailey, 2013; Sinkunas et al., 2013). This difference 
may be explained by the potential for recruitment of additional 
Cas3 molecules in the bulk biochemical assays through a 
Cascade-independent pathway, as previously suggested (Mule- 
pati and Bailey, 201 3). Consistent with this explanation, Cas3 is a 
potent ssDNA nuclease even in the absence of Cascade (Mule- 
pati and Bailey, 2013; Sinkunas et al., 2011, 2013). Thus, the 
ssDNA gaps generated by the first molecule of Cas3 likely reflect 
an early intermediate in the degradation pathway and serve as an 
entryway for additional ssDNA-specific nucleases, including 
Cas3 or perhaps other host enzymes. Together, these findings 



Figure 7. Model for Foreign DNA Recogni- 
tion and Processing by Cascade, Casi, 
Cas2, and Cas3 

(A) The predominant mechanism for proto- 
spacer recognition is through the PAM -dependent 
pathway. 

(B) PAM-dependent processing invoives the 
recruitment of Cas3 to the protospacer by 
Cascade. Cas3 nicks the R-ioop and generates an 
ssDNA gap; Cas3 can dissociate at either of these 
two steps. Cas3 then breaks free from Cascade 
and traveis unidirectionaiiy aiong the DNA. 

(C) PAM-independent processing requires Cas1- 
Cas2 to recruit Cas3. Cas3 is ioaded onto the DNA 
in one of two possibie orientations through a 
mechanism that attenuates Cas3 nuciease activ- 
ity. Cas3 then traveis in either direction aiong the 
DNA as part of a spacer acquisition compiex. 

See aiso Figure S6. 



suggest that the early stages of foreign 
DNA degradation involve the ATP-depen- 
dent recruitment of just one molecule of 
Cas3 through a mechanism that requires 
Cascade-specific contacts and an intact 
R-loop. This initial transient binding event 
exhibits a ~6-s lifetime (ti) before Cas3 
transitions into a more stably bound inter- 
mediate. The first stably bound molecule 
of Cas3 then generates a short ssDNA 
gap, reflected in the delay time (t 2 ) prior 
to moving away from Cascade, and after 
being released from Cascade, this Cas3 
molecule can either dissociate into solu- 
tion or continue traveling along the re- 
maining duplex DNA. Any subsequent 
recruitment of Cas3 (or other nucleases) 
occurs through nonspecific interactions with the resultant 
ssDNA gap. 

Cascade remains tightly bound to the DNA even after Cas3 
generates an ssDNA gap and moves away from the protospacer. 
It is possible that continued presence of Cascade may distin- 
guish these Cas3-generated gaps from other ssDNA gaps that 
can be produced during normal DNA metabolism, and Cascade 
may perhaps prevent host DNA repair proteins from filling in 
these gaps before the invading DNA is eventually destroyed. 

PAM-Dependent Cas3 Motor Activities 

Cas3 is a fast and highly processive molecular motor, which is re- 
cruited by Cascade through the PAM-dependent pathway and 
then translocates along the flanking DNA. This translocation 
does not coincide with any apparent DNA degradation or persis- 
tently unwound DNA. When Cas3 is recruited by Cascade through 
the PAM-dependent pathway, it always moves in the same direc- 
tion along the DNA, consistent with expectations for 3' ^ 5' trans- 
location along the non-target strand. A subset of Cas3 molecules 
also forms optically detectable looped intermediates, and Cas3 
likely generates smaller DNA loops that cannot be observed in 
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our experiments, suggesting these looped intermediates may be 
a common feature of the PAM-dependent pathway (Figure 7B). 
Interestingly, similar looping behaviors have been reported for 
many different SF1 and SF2 helicases, including Saccharomyces 
cerevisiae Pif1 (Zhou et al., 2014), Bacillus stearothermophilus 
PcrA (Park et al., 2010), E. coll Rep (Myong et al., 2005), and 
S. cerevisiae Srs2 (Qiu et al., 2013). The looping behaviors ex- 
hibited by these proteins are thought to help establish and main- 
tain a particular structural state of the DNA; for instance, PcrA and 
Srs2 repeatedly shuttle back and forth, while removing proteins 
from ssDNA proximal to an ssDNA/dsDNA junction to prevent 
aberrant recombination (Park et al., 2010; Qiu et al., 2013). Simi- 
larly, Pifi repeatedly unwinds G-quadraplexes, ensuring that 
these structures do not inhibit DNA replication (Zhou et al., 
2014). The looping activity observed for Cas3 may reflect at- 
tempts to dissociate from Cascade. Alternatively, looping may 
help keep the ssDNA gap clear of proteins, free of secondary 
structures or both, until the arrival of additional Cas3 molecules 
or other accessory nucleases. 

PAM-Independent Target Recognition 

Like the PAM-dependent search, the PAM-independent 
pathway also occurs by microscopic 3D diffusion, suggesting 
that Cascade must test for complementarity to the crRNA by 
either transiently melting the DNA or by taking advantage of 
the intrinsic breathing of the DNA duplex (Figure 7A). One pri- 
mary difference between PAM-dependent and PAM-indepen- 
dent target recognition is that the efficiency of the PAM-indepen- 
dent pathway is comprised, such that a higher concentration of 
Cascade is required to achieve similar levels of occupancy at 
both targets. Despite this disparity in apparent association con- 
stants, Cascade can still bind tightly to DNA regardless of 
whether or not the protospacer has a canonical PAM. In both in- 
stances, the lifetime of the target-bound Cascade complexes is 
significantly longer that the typical doubling time of E. coll, a 
finding that is in good agreement with the results of magnetic 
tweezer experiments (Szczelkun et al., 2014). This tight binding 
would help ensure that even though escape target recognition 
is inefficient, in the rare instances in which an escape target is 
captured. Cascade would remain in place long enough to initiate 
downstream steps necessary for primed sequence acquisition 
(Figure 7C; see below). Interestingly, not all PAM mutations are 
equal with respect to Cascade, and the defect in binding with 
the ATT mutant PAM is more moderate that some other PAM 
mutations (Szczelkun et al., 2014). Future studies will be essen- 
tial for testing the effects of other PAM mutations on target bind- 
ing in these single-molecule assays. 

Interestingly, recent single-molecule fluorescence resonance 
energy transfer (FRET) experiments have suggested that 
Cascade recognizes escape targets with substantially reduced 
fidelity, and interactions with these targets are characterized 
by a ~25-s lifetime (Blosser et al., 2015), which is identical to 
one of the nonspecific lifetimes observed in our experiments 
(Figure SI). We suggest that these shorter-lived complexes 
found by FRET reflect intermediates that have failed to transition 
into the more tightly bound complexes observed in our assays. 

Importantly, PAM escape mutations reflect only a subset 
of mutations that can lead to a priming response, with the 



remainder occurring within the protospacer, but both types of 
escape mutants lead to similar priming responses (Datsenko 
et al., 2012; Fineran et al., 2014). We anticipate that Cascade 
will locate protospacer escape mutants through the normal 
PAM-dependent search pathway, but then may require Casl- 
Cas2 to recruit Cas3 and initiate a priming response from this 
class of escape mutations. 

Cas1-Cas2 Recruitment of Cas3 to Escape Targets 

We demonstrate that the Casi -Cas2 complex serves as a trans- 
acting factor necessary for the recruitment and regulation of 
Cas3 at protospacers bearing an escape PAM (Figure 7C). 
Recruitment may occur through one of two general mechanisms. 
Casi -Cas2 may modify the structure of Cascade such that it can 
now directly recruit Cas3 by the same process as occurs during 
PAM-dependent recruitment. Alternatively, protein-protein con- 
tacts with Cas1-Cas2 may directly recruit Cas3 to the escape 
target through a mechanism that is distinct from the Cascade- 
dependent recruitment at cognate protospacers. Importantly, 
the behavior of Cas3 at the escape targets differs markedly 
from the behavior of Cas3 at cognate targets. First, Cas3 can 
translocate in either direction from the escape targets, implying 
that that Cas3 is loaded onto the flanking phage DNA through 
a different pathway than is observed at cognate protospacers. 
Second, there was no evidence that Cas3 generates ssDNA 
gaps at the escape targets, nor was there any evidence that 
Cas3 even nicked the DNA when loaded at escape targets, sug- 
gesting that the nuclease activity of Cas3 is fully attenuated at 
escape targets. The inability of Cas3 to cleave the escape target 
is also consistent with the fact that the vast majority of cells will 
die when infected with phage bearing an escape mutation, and 
immunity is only conferred for those rare survivors that success- 
fully update the CRISPR locus (Datsenko et al., 2012). Third, 
Cas3 loaded at escape targets exhibited only a ~6-s lifetime 
prior to initiating translocation, but there was no evidence for 
the longer-lived intermediate (t 2 ) that we have ascribed to 
ssDNA degradation. Fourth, there was no evidence for DNA 
looping when Cas3 initiated translocation from the escape 
target, suggesting that Cas3 is more readily released from 
Cascade at the escape target. Together, these observations 
suggest that Cas1-Cas2 recruits and loads Cas3 onto the DNA 
flanking the escape targets through a mechanism that is distinct 
from the Cascade-mediated mechanism that takes place at 
cognate protospacers. 

Primed Acquisition of New Spacer Sequences 

Together, our data provide direct support for a model of primed 
sequence acquisition involving Cas1-Cas2-mediated recruit- 
ment of Cas3 to Cascade at escape targets, followed by 
ATP-dependent translocation of Cas3 along the foreign DNA 
(Figure 7C). Cas3 can move in either direction away from the 
escape target, consistent with the expectation that new spacers 
can be acquired from either side of an escape target (Richter 
et al., 2014). Translocation of Cas3 away from the escape target 
does not induce DNA damage, and we speculate that Cas3 may 
be looking for an as-yet-unidentified signal (e.g., DNA sequence, 
partner protein, or both) necessary to activate its nuclease activ- 
ity, or the nuclease activity of a partner protein, at some distal 
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location. Importantly, although the tagged Cas6e subunits 
remain bound to the protospacer after Cas3 translocation, we 
do not know whether the other Cas proteins are also left behind. 
It is possible that Cas3 takes a subset of Cascade components 
while translocating along the DNA. In fact, Cas3 is naturally 
linked with Cse1 in a single polypeptide chain in some systems, 
suggesting that Cse1 may have additional downstream func- 
tions during Cas3 translocation (Westra et al., 2012b). In addi- 
tion, Cas1-Cas2 are essential to process and insert new spacer 
sequences into the CRISPR locus (Nunez et al., 2014; Nunez 
et al., 2015), and one attractive model is that Cas1-Cas2 travel 
with Cas3 as part of a larger spacer acquisition complex (Fig- 
ure 7C), which would allow delivery of Cas1-Cas2 to sites distal 
to an escape target, where they would then be able to process 
the DNA to promote new spacer acquisition. In support of this 
model, studies in the closely related type IF CRISPR-Cas sys- 
tem from Pectobacterium atrosepticum have shown that Cas3 
interacts directly with Casi (Richter et al., 2012). 

Early models suggested Cascade might diffuse away from the 
escape target (Datsenko et al., 2012). However, this model was 
later disfavored because the distribution of new spacers ac- 
quired from a circular plasmid was inconsistent with expecta- 
tions for a diffusion-based mechanism, which would predict a 
strong bias toward acquisition of new spacer sequences near 
the original protospacer (Savitskaya et al., 2013). The high proc- 
essivity of E. coli Cas3 (~12-kbp) explains why assays using 
relatively small plasmids (~5-kb) fail to yield a biased distribution 
of newly acquired spacer sequences as predicted by the 
original sliding hypothesis (Heler et al., 2014; Savitskaya et al., 
2013). Interestingly, the type IF CRISPR-Cas system from 
P. atrosepticum does exhibit a biased distribution of newly ac- 
quired spacers in response to an escape mutation (Richter 
et al., 2014). Assuming that priming occurs through a similar 
mechanism for the type 1 F and type 1 E CRISPR-Cas systems, 
our model predicts that P. atrosepticum Cas3 is less processive 
that E. coii Cas3, explaining why spacer acquisition bias can be 
observed in plasmid assays for P. atrosepticum. 

Our data demonstrate that the first Cas3 molecule recruited to 
cognate protospacers through the PAM-dependent pathway can 
translocate rapidly away from Cascade before the DNA is de- 
stroyed. Moreover, the nuclease activity of Cas3 was partially 
attenuated by Casi -Cas2 at cognate protospacers, allowing iter- 
ative Cas3 firing events presumably before the eventual destruc- 
tion of the R-loop. Together, these observations suggest that 
priming might take place even when there is no escape mutation 
present in the invading DNA (Figure 7B). The ability to occasion- 
ally acquire new spacers in the absence of an escape mutation 
may allow microbes to routinely update the CRISPR locus even 
before foreign genetic elements have the opportunity to evade 
the CRIPSR/Cas immune response by acquiring new mutations. 

EXPERIMENTAL PROCEDURES 
Single-Molecule Assays 

DNA curtains were fabricated by eiectron-beam iithography as previousiy 
described (Greene et ai., 2010; Sternberg et ai., 2014). A iipid biiayer was 
then deposited on the surface of the sampie chamber; the anchor points 
were coated with anti-digoxigenin antibodies; and the DNA was anchored to 
the biiayer through a biotin-streptavidin iinkage. The DNA was then aiigned 



aiong the ieading edges of the Cr diffusion barriers and coupied to the 
antibody-coated anchors through the appiication of hydrodynamic force. 
Cascade singie-moiecuie binding assays were conducted in reaction buffer 
containing 40 mM Tris-FlCi (pFI 7.4), 1 mM MgCi 2 , 25 mM KCi, 1 mg/ml 
BSA, 0.8% giucose, YOYO-1, and a giucose oxidase-cataiase oxygen- 
scavenging system. The Cas6e subunit of Cascade was expressed with 
an N-terminai 3xFI_AG tag, and the Cascade compiex was iabeied with 
antiFLAG-coated QDs (invitrogen) for 1 0 min on ice prior to use. in experiments 
with Cas3, the YOYO-1 dye was omitted, and the reaction buffer was suppie- 
mented to contain 2 mM MgCi 2 , 1 mM ATP, and 20 laM CoCi 2 . Cas3 was 
iabeied by incubation with streptavidin-coated QDs (invitrogen) on ice for 
20 min prior to injection onto the flow cell at 4 nM final concentration. RPA- 
eGFP labeling of ssDNA gaps was always performed at the end of the Cas3 
experiments as a check for activity. In these assays, Cas3 was flushed from 
the sample chamber, followed by delivery of buffer containing 100 nM RPA- 
eGFP. The buffer flow was then terminated, and RPA-eGFP was incubated 
with the DNA for 10 min prior to imaging. Buffer conditions for experiments 
containing Cas1-Cas2 were identical to those above. Casi (8 nM) and Cas2 
(16 nM) were pre-incubated on ice for 20 min and then mixed with Cas3 
(4 nM) for an additional 5 min before being delivered to the flow cell. All 
single-molecule experiments were conducted at 25°C, unless otherwise indi- 
cated, and all data were collected and analyzed as previously described 
(Sternberg et al., 2014). 
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SUMMARY 

A deficiency in pejvakin, a protein of unknown func- 
tion, causes a strikingly heterogeneous form of hu- 
man deafness. Pejvakin-deficient (Pjvk~^~) mice 
also exhibit variable auditory phenotypes. Correla- 
tion between their hearing thresholds and the num- 
ber of pups per cage suggest a possible harmful 
effect of pup vocalizations. Direct sound or electrical 
stimulation show that the cochlear sensory hair cells 
and auditory pathway neurons of Pjyk~'~ mice and 
patients are exceptionally vulnerable to sound. Sub- 
cellular analysis revealed that pejvakin is associated 
with peroxisomes and required for their oxidative- 
stress-induced proliferation. P]vk~'~ cochleas dis- 
play features of marked oxidative stress and im- 
paired antioxidant defenses, and peroxisomes in 
P\vk~'~ hair cells show structural abnormalities 
after the onset of hearing. Noise exposure rapidly 
upregulates Pjvk cochlear transcription in wild- 
type mice and triggers peroxisome proliferation in 
hair cells and primary auditory neurons. Our results 
reveal that the antioxidant activity of peroxisomes 
protects the auditory system against noise-induced 
damage. 
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INTRODUCTION 

Mutations of PJVK, which encodes pejvakin, a protein of un- 
known function present only in vertebrates, cause the DFNB59- 
recessive form of sensorineural hearing impairment. In the first 
patients described (Delmaghani et al., 2006), the impairment 
was restricted to neurons of the auditory pathway, with auditory 
brainstem responses (ABRs) displaying abnormally decreased 
wave amplitudes and increased inter-wave latencies (Starr and 
Ranee, 2015). ABRs monitor the electrical response of auditory 
pathways to brief sound stimuli, from the primary auditory neu- 
rons synapsing with the sensory cells of the cochlea, the inner 
hair cells (IHCs), to the colliculus in the midbrain (Moller and Jan- 
netta, 1983). However, some DFNB59 patients were found to 
have a cochlear dysfunction, as shown by an absence of the 
otoacoustic emissions (OAEs) that are produced by the outer 
hair cells (OHCs), frequency-tuned cells endowed with electro- 
motility that mechanically amplify the sound stimulation of neigh- 
boring IHCs (Ashmore, 2008). These patients had truncating 
mutations of PJVK, whereas the previously identified patients, 
with extant OAEs, had missense mutations (p.T54l or p.R183W) 
(Ebermann et al., 2007; Schwander et al., 2007; Borck et al., 
2012). However, the identification of patients also carrying the 
P.R183W missense mutation but lacking OAEs (Collin et al., 
2007) refuted any straightforward connection between the nature 
of the PJVK mutation and the hearing phenotype. The severity of 
deafness in DFNB59 patients varies from moderate to profound 
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Figure 1. Hearing Loss Variability and Greater Sensitivity to Con- 
trolled Sound Exposure in Pjvk~^~ Mice 

(A) ABR thresholds at 10 kHz in P30 P\vk^'^ (n = 26 mice) and Piv\c'- (n = 48 
mice) iittermates. 

(B) DPOAE threshoids at 1 0 kHz in P30 P\v\<^'^ (n = 1 4 mice) and P\v\c'- (n = 48 
mice) iittermates. In ears with no DPOAE, even at 75 dB SPL (the highest sound 
intensity tested), DPOAE thresholds were arbitrarily set at 80 dB SPL. 

(0) Relationship between the number of pups raised together (determining 
sound levels in the immediate environment) and ABR thresholds at 10 kHz in 
P21 P\v\c'~ pups. Inset: a time-frequency analysis of a mouse pup’s vocali- 
zation. Pup calls from PO to P21 form harmonic series of about 5 kHz, with the 
most energetic harmonic at about 10 kHz. In a 12-pup litter, call levels reach 
105 ± 5 dB SPL at the entrance to the ear canals of the pups. 

(D) ABR thresholds at 1 0 kHz in P30 P\v\<^'^ and P\v\c'~ mice before (dots) and 
after (crosses) controlled sound exposure, ns, not significant; ***p < 0.001 . 
See also Figure SI . 



and may even be progressive in some patients, suggesting that 
extrinsic factors may influence the hearing phenotype. 

We investigated the role of pejvakin, with the aim of determining 
the origin of the phenotypic variability of the DFNB59 form of deaf- 
ness. Our study of PJvk knockout mouse models and of patients 
revealed an unprecedented hypervulnerability of auditory hair 
cells and neurons to sound exposure, accounting for phenotypic 
variability. We found that pejvakin is a peroxisome-associated 
protein involved in the oxidative-stress-induced proliferation of 
this organelle. Pejvakin-deficient mice revealed the key role of per- 
oxisomes in the redox homeostasis of the auditory system and in 
the protection against noise-induced hearing loss. 

RESULTS 

Heterogeneity in the Hearing Sensitivity of Pjyk~'~ Mice 

We generated pejvakin-null (PJvk~^~) mice carrying a deletion of 
PJvk exon 2, resulting in a frameshift at codon position 71 



(p.Gly71/s*9) (Figure SI; see the Supplemental Experimental 
Procedures). ABR thresholds recorded on postnatal day 30 
(P30) Pjvk~^~ mice (n = 48) ranged from 35 to 11 0 dB SPL (sound 
pressure level) at 10 kFIz but never exceeded 30 dB SPL in their 
PJvk^'^ Iittermates (n = 26) (Figure 1 A). This broad range of hear- 
ing sensitivity in P\vk~'~ mice, from near-normal hearing to 
almost complete deafness, extended across the whole fre- 
quency spectrum. The thresholds of distortion-product OAEs 
(DPOAEs) at 10 kFIz (i.e., the minimum stimulus required for 
DPOAEs production by OHCs) also fell within an abnormally 
large range of values, from 30 to 75 dB SPL, in 28 P\vk~'~ 
mice, indicating an OHC dysfunction, and DPOAEs were unde- 
tectable in another 20 PJvk~'~ mice, suggesting a complete 
OHC defect (Figure 1 B). The absence of pejvakin in mice thus re- 
sults in a puzzlingly large degree of hearing phenotype variability. 

Hypervulnerability to the Naturai Acoustic Environment 
in PJvk~^~ Mice 

We investigated the variability of Pjvk~^~ auditory phenotypes, 
by first determining the ABR thresholds of Pjvk~^~ Iittermates 
from different crosses. Large differences were observed be- 
tween crosses, with much fewer differences between the Pjvk~^~ 
Iittermates of individual crosses. Litters with larger numbers of 
pups (6 to 12) had higher ABR thresholds, suggesting that the 
natural acoustic environment, with the calls of larger numbers 
of pups, might be deleterious in Pjvk~^~ mice. Pups are vocally 
very active from birth to about P20. We manipulated the level 
of exposure to pup calls by randomly splitting large litters of 
Pjvk~^~ pups into groups of 2, 4, 6 and 10 pups per cage, with 
foster mothers, before P10, i.e., several days before hearing 
onset. The ABR thresholds at P21 were significantly correlated 
with the number of pups raised together (p < 0.001, r^ = 0.51) 
(Figure 1C). 

We then evaluated the effect of a controlled sound stimulation 
on hearing, by presenting 1,000 tone bursts at 10 kHz, 105 dB 
SPL (2-ms plateau stimulations separated by 60-ms intervals 
of silence), energetically equivalent to a 3-min stay in the natural 
environment of a 1 2-pup litter, while monitoring the ABRs during 
sound exposure. These conditions are referred to hereafter as 
“controlled sound exposure.” We probed the effect of sound 
exposure by ABR tests, which, limited to 50 repetitions of tone 
bursts, did not influence the hearing thresholds of Pjvk~^~ 
mice. In a sample of P30 Pjvk~^~ mice with initial ABR threshold 
elevation (below 35 dB SPL), controlled sound exposure affected 
ABR thresholds in the 1 2-20 kHz frequency interval (correspond- 
ing to the cochlear zones in which hair-cell stimulation was 
strongest), with an immediate increase of 21.7 ± 10.3 dB (n = 
8; p < 0.001), not observed in Pjvk^^^ mice (2.2 ± 2.4 dB, n = 
12; p = 0.3) (Figure ID). Pjvk~^~ mice transferred to a silent 
environment after exposure displayed a further increase of 
33.7 ± 16.0 dB (n = 8) 2 days after exposure. The threshold shift 
decreased to 23.7 ± 18.0 dB at 7 days, and disappeared entirely 
by 14 days. When exposed mice were returned to the box with 
their Iittermates, their ABR continued to increase, at a rate of 
15 dB per week. Pejvakin deficiency thus results in particularly 
high levels of vulnerability to low levels of acoustic energy, and 
the increase in ABR thresholds is reversible but only slowly 
and in a quiet environment. 
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Hair Cells and Auditory Pathway Neurons Are Affected 
by Pejvakin Deficiency 

To identify the cellular targets of the pejvakin deficiency, we spe- 
cifically probed the function of auditory hair cells and neurons in 
Pjvk~^~, hair cell-conditional Pjvk knockout (PyV/c^'^^'Myo15- 
cre^^~), and Pjvk^^^ mice, at the age of 3 weeks, before and after 
controlled sound exposure or controlled electrical stimulation. 
The responses of the IHCs to sound-induced vibrations ampli- 
fied by OHCs trigger action potentials in the distal part of primary 
auditory neurons, at the origin of ABR wave I. In PyV/c^'^^'Myo15- 
cre‘^^“ mice, which lack pejvakin only in the hair cells, ABR 
wave I amplitude and latency at 105 dB SPL specifically probed 
IHC function, because IHC responses to such loud sounds 
are independent of OHC activity (Robles and Ruggero, 2001). 
The larger wave I latency (1.58 ms in PyV/c^‘^^'Myo15-cre'^^“ mice 
[n = 20] versus 1.32 ms in PJvk^^^ littermates [n = 30]; p < 
0.001) and lower wave I amplitude (37% of the amplitude in 
P\vk^'^ littermates; p < 0.001) suggested a dysfunction of the 
IHCs. Controlled sound exposure induced further decreases in 
ABR wave I amplitude in P\vk~'~ and PyV/c^'^^'Myo15-cre‘^^“ mice 
(48% and 55% of pre-exposure amplitude, respectively) with 
respect to P\vk^'^ mice (108%; p < 0.001 for both comparisons) 
(Figure 2A), demonstrating that P\vk~'~ IHCs are hypervulnerable 
to sound. As shown above, OHCs are also affected by the pejva- 
kin deficiency. Controlled sound exposure triggered a mean 
decrease in the DPOAE amplitude of 16.9 ± 7.2 dB in the 12 to 
20 kHz frequency interval in P\vk~'~ mice with persistent 
DPOAEs (n = 8; p < 0.0001 ), and an increase in DPOAE threshold, 
but it had no effect on the DPOAEs of P\vk^'^ mice (n = 9; p = 
0.51) (Figure 2B). OHCs lacking pejvakin are thus also hypervul- 
nerable to sound. 

We investigated the effect of the absence of pejvakin on the 
auditory pathway by comparing electrically evoked brainstem 
responses (EEBR) in P\vk~'~ and PyV/c^'^^‘Myo15-cre'^^“ mice 
(see the Supplemental Experimental Procedures). The ampli- 
tudes of the most distinctive EEBR waves, E II and E IV, 
did not differ between the two types of mice (for wave E IV: 
2.6 ± 1.8 |iV in P\vk~'~ mice [n = 18] and 2.2 ± 1.2 yN in 
PyV/c^'/^'Myo15-cre'^^“ mice [n = 11]; t test, p = 0.13). However, 
following controlled electrical exposure at 200 impulses/s for 
1 min, as opposed to electric-impulse stimulation with 16 im- 
pulses/s for 10 s for pre- and post-exposure EEBR tests, E II 
and E IV EEBR wave amplitudes got 41% and 47% smaller, 
respectively, for at least 3 min, in P\vk~'~ mice (n = 5; paired 
t test, p = 0.02 and p = 0.01 , respectively), but were unaffected 
in PyV/c^'^^'Myo15-cre^^“ mice (n = 10; p = 0.83) (Figures 2D and 
2G-2I). The E ll-E IV interwave interval was 0.41 ms longer in 
P\vk~'~ mice (n = 5) than in PyV/c^'^^'Myo15-cre'"^“ mice (n = 10; 
p = 0.003), and controlled electrical exposure extended this in- 
terval by a further 0.15 ms in P\vk~'~ mice only (paired t test, 
p = 0.001) (Figures 2H and 21). Likewise, the latency interval be- 
tween ABR wave I and wave IV (the counterpart of wave E IV), 
abnormal in one-third of the P\vk~'~ mice tested (with an ABR 
threshold < 95 dB SPL, n = 1 2) (Figures 2C and 2E), got abnormal 
in all of them after controlled sound exposure (0.1 6 ms further in- 
crease; paired t test, p < 0.001). By contrast, it remained normal 
in PyV/c^'^^'Myo15-cre'"^“ mice (n = 10 ears; p = 0.73) (Figures 2C 
and 2F). Thus, the absence of pejvakin affects the propagation 



of action potentials in the auditory pathway after both controlled 
electrical and sound exposure in the P\vk~'~ mice. 

To clarify whether these abnormalities were of neuronal or glial 
origin, we performed a rescue experiment in P\vk~'~ mice, using 
adeno-associated virus 8 (AAV8) vector-mediated transfer of the 
murine pejvakin cDNA (AAV8-Pjvk). AAV8 injected into the 
cochlea transduces the primary auditory neurons (cochlear gan- 
glion neurons) and neurons of the cochlear nucleus (Figure S2A), 
but not the hair cells. All P\vk~'~ mice (n = 7) injected on P3 and 
tested on P21 had normal ABR interwave l-IV latencies (Fig- 
ure 2J), and their EEBR wave-E IV amplitude was insensitive to 
controlled electrical stimulation (1.91 ± 0.97 yN before and 
1 .87 |iV ± 1 .07 after stimulation; paired t test, p = 0.59) (Figures 
2K and 2L). The absence of pejvakin thus renders auditory 
pathway neurons hypervulnerable to exposure to mild, short 
sound stimuli. 

Hypervulnerability to Sound in DFNB59 Patients 

We then investigated whether the hearing of DFNB59 patients 
was also hypervulnerable to sound exposure. We tested five 
patients carrying the p.T54l mutation (Delmaghani et al., 2006). 
Transient-evoked OAEs (TEOAEs) assessing OHC function 
over a broad range of frequencies were detected for all ears, 
despite the severe hearing impairment (hearing threshold in- 
creasing from 66 dB HL at 250 Hz to 84 dB at 8 kHz). Following 
minimal exposure to impulse stimuli (clicks at 99 dB nHL), ABR 
waves were clearly identified in response to 250 clicks. When 
exposure was prolonged to 1 ,000 clicks (the standard proce- 
dure), wave V, the equivalent of mouse ABR-wave IV, which 
was initially conspicuous, displayed a decrease in amplitude 
(to 39% ± 30% of its initial amplitude) and an increase in latency 
(of 0.30 ± 0.15 ms) (Figures 3A, 30, and 3D). In parallel, the 
l-V interwave interval increased by 0.30 ± 0.15 ms. Wave-V 
amplitude and latency recovered fully after 10 min of silence 
(Figure 3B). In control patients with sensorineural hearing im- 
pairment of cochlear origin matched for ABR thresholds, 
similar sound stimulation did not affect ABR wave-V amplitude 
(105% ± 14% of the initial amplitude after exposure; n = 13 pa- 
tients) or latency (-0.02 ± 0.07 ms change after exposure) (Fig- 
ures 3C and 3D). Exposure of the DFNB59 patients to 1 ,000 
clicks also affected TEOAEs (6.1 ± 5.2 dB nHL decrease in ampli- 
tude; paired t test, p = 0.02). Therefore, as in pejvakin-deficient 
mice, the cochlear and neuronal responses of DFNB59 patients 
were affected by exposure to low-energy sound. 

Redox Status Abnormalities and ROS-Induced Cell 
Damage in the Cochlea of Pjvk~'~ Mice 

We studied the P\vk~'~ cochlea by light microscopy on semithin 
sections and electron microscopy. On PI 5 and P21 , both OHCs 
and IHCs were normal in number and shape. Their hair bundles 
(the mechanoreceptive structures responding to sound), the rib- 
bon synapses of the IHCs, and their primary auditory neurons 
were unmodified (data not shown). On P30, we observed the 
loss of a few OHCs (16% ± 1 1 %, n = 5 mice), restricted to the 
basal region of the cochlea (tuned to high-frequency sounds). 
From P30 onward, OHCs, cochlear ganglion neurons, and then 
IHCs disappeared, and the sensory epithelium (organ of Corti) 
progressively degenerated (Figure S3A). 
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Figure 2. Effects on Auditory Function of Brief Exposure to Moderately Intense Stimuli in Pjvk'^^*, Pjvk~^~, and PyV/c^'^^'MyolS-cre"^^' Mice 

(A-C) ABR wave I amplitude (A), DPOAE amplitude (B), and ABR interwave l-IV latency (C) in P\vk^'^, P\vk~'~ , and Pjvk^'^^'[J\yo^5-cre'^^~ mice, before (dots) and 
after (crosses) controlled sound exposure, revealing the hypervulnerability to sound of both types of cochlear hair cells (IHCs and OHCs) and of the neural 
pathway. 

(D) EEBR wave E IV amplitude before and after controlled electrical exposure in P\vk~'~ and Pjvk^'^^'W\yo^5-cre'^^~ mice was abnormal and hypervulnerable only 
when pejvakin is absent from auditory neurons {Pjvk~^~ mice). 

(E and F) Examples of ABRs in P\vk~'~ and Pjvk^'^^'W\yo^5-cre'^^~ mice: the latency of wave I is affected by controlled sound exposure in both mutant mice, and 
wave IV displays an additional increase in latency only in P\vk~'~ mice. 

(G-l) Examples of EEBRs in P\vk^'^ (G), P\vk~'~ (H), and Pjvk^'^^'W\yo^5-cre'^^~ (I) mice; EEBRs are affected by controlled electrical exposure only in P\vk~'~ mice. 

(legend continued on next page) 
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We investigated possible changes in gene expression in the 
organ of Corti of P15 Pjvk~^~ mice, by microarrays (see the 
Supplemental Experimental Procedures). Eighteen genes had 
expression levels at least 1.5-fold higher or lower in Pjvk~^~ 
mice than in Pjvk'^^^ mice. Marked differences were observed 
for four genes involved in the redox balance— Cyp/A, Gpx2, c- 
Dct, and Mpv7 7— encoding cyclophilin A, glutathione peroxi- 
dase 2, c-dopachrome tautomerase, and Mpv17, respectively 
(Table SI). All of these genes were downregulated in Pjvk~^~ 
mice, a result confirmed by qRT-PCR (Figure S4A), and all 
encode antioxidant proteins, suggesting that Pjvk~^~ mice 
have impaired antioxidant defenses (Table SI). 

We thus assessed the level of oxidative stress in the cochlea of 
P21 Pjvk~^~ mice, by determining the ratio of reduced to oxidized 
glutathione (GSH:GSSG). The GSSG content was about three 
times larger than in Pjvk'^^^ mice, whereas the GSH content 
was 23% smaller, resulting in a GSH:GSSG ratio in Pjvk~^~ co- 
chleas reduced by a factor of 3.4 (Figure 4A). Pejvakin deficiency 
thus results in cochlear oxidative stress. 

We assessed lipid peroxidation by reactive oxygen species 
(ROS) in PJvk~^~ mice, by immunofluorescence-based detection 
of the by-product 4-hydroxy-2-nonenal (4-FINE). Strong immu- 
noreactivity was observed in P60 Pjvk~^~ hair cells and cochlear 
ganglion neurons (Figure S3B). Quantification of lipid peroxida- 
tion in microdissected organs of Corti from P30 Pjvk~^~ and 
Pjvk'^^^ mice, showed a moderate, but statistically significant, 
increase of the malondialdehyde content in the absence of 
pejvakin (2.15 ± 0.14 |iM in Pjvk~^~ versus 1.84 ± 0.11 |iM in 
Pjvk'^^^ mice; p = 0.04). Thus, pejvakin deficiency led to impaired 
antioxidant defenses in the cochlea, resulting in ROS-induced 
cell damage. 

We then studied electrophysiological features of IHCs and 
OHCs in the mature cochlea of P19-P21 Pjvk~^~ mice. In IHCs, 
the number of synaptic ribbons, Ca^'^ currents, and synaptic 
exocytosis were unaffected (Figure S5A). We investigated 
whether Pjvk~^~ mice display the main currents found in 
mature IHCs, specifically /k.t, which plays a major part in IHC 
repolarization and is involved in the high temporal precision of 
action potentials in postsynaptic nerve fibers, /k,s, and Ik,u (Oliver 
et al., 2006). The /kj current that flows through the large conduc- 
tance voltage- and Ca^'^-activated potassium (BK) channels, a 
well-known target of ROS (Tang et al., 2004), was detected in 
only 4 out of 1 1 Pjvk~^~ IHCs, and the mean number of spots im- 
munolabeled for the BK a-subunit per IHC was much lower in 
PJvk~^~ mice (5.0 ± 1 .4, n = 283 IHCs from seven mice) than in 
P\vk^'^ mice (13.9 ± 2.6, n = 204 IHCs from nine mice; t test, 
p < 0.001). By contrast, the /k,s and /k.d currents were not 
affected (Figures 4B and S5B). The electromotility of OHCs 
was moderately impaired in P\vk~'~ mice (Figure S5C). This con- 
trasted with the total loss of DPOAE in a large majority of Pjvk“^“ 
mice from PI 5 on, even at the highest possible stimulus level of 
75 dB SPL. It thus pinpointed the existence of an additional 
defect, likely a mechanoelectrical transduction defect, the 



main determinant of DPOAEs at high stimulus levels (Avan 
et al., 201 3). The decrease of the cochlear microphonic potential 
that reflects mechanoelectrical transduction currents through 
OHCs of the basal-most cochlear region, indeed corroborated 
the DPOAE measurements: this potential, recorded for a 5-kHz 
sound stimulus at 95 dB SPL, was always larger than 10 |iV in 
P\vk^'^ mice (n = 8), but fell between 5 and 3 |iV in the P21 P\vk~'~ 
mice with residual DPOAEs (n = 2), and below 1 yN, in the P\vk~'~ 
mice without persisting DPOAEs (n = 6). Taken together, oxida- 
tive stress in the P\vk~'~ cochlea impacts various electrophysio- 
logical properties of the hair cells, particularly mechanoelectrical 
transduction and current through BK channels. 

Mitochondrial defects are a common cause of ROS overpro- 
duction. However, we did not find evidence that mitochondria 
were damaged, as vulnerability of the mitochondrial membrane 
potential, Avjim, to the uncoupler carbonyl cyanide 4-(trifluorome- 
thoxy)phenylhydrazone (FCCP) in the organ of Corti and 
cochlear ganglion was similar in P17-P30 P\vk~'~ and P\vk^'^ 
mice, and analysis of P\vk~'~ hair cells by transmission electron 
microscopy (TEM) revealed no mitochondrial abnormalities (Fig- 
ure S5D; data not shown). 

Pejvakin Is a Peroxisome-Associated Protein 

By using P\vk~'~ cochlea as control, we found that neither the 
commercially available antibodies nor our initial polyclonal anti- 
body (Delmaghani et al., 2006) specifically recognized pejvakin 
(data not shown). Given the limited divergence of the pejvakin 
amino-acid sequence among vertebrates, we tried to elicit an 
immune response in P\vk~'~ mice (see the Experimental Proce- 
dures). The monoclonal antibody obtained, Pjvk-G21, labeled 
peroxisomes stained by peroxisome membrane protein 70 
(PMP70) antibodies in transfected HeLa cells expressing pejvakin 
(Figure S6A) and in the human HepG2 hepatoblastoma cell line, 
which is particularly rich in this organelle (Figure 5A). The speci- 
ficity of the Pjvk-G21 antibody was demonstrated by the immuno- 
labeling of peroxisomes in the hair cells of P\vk^'^, but not of 
P\vk~'~ and PyV/c^'^^'Myo15-cre^^“ mice (Figures 5B and SOB). 

Prediction programs failed to detect the PTS1 or PTS2 motifs 
in the pejvakin sequence (Mizuno et al., 2008), the targeting sig- 
nals for the importation of peroxisomal matrix proteins into the 
organelle (Smith and Aitchison, 2013), suggesting that pejvakin 
is a peroxisomal membrane or membrane-associated protein. 

Structural Abnormalities of Peroxisomes in the Hair 
Cells of PJvk~'~ Mice 

We investigated the distribution and morphology of peroxisomes 
by TEM. Peroxisomes were identified on the basis of catalase 
activity detection using 3,3'-diaminobenzidine as substrate. 
We focused on OHCs, the first to display a dysfunction in Pjvk~^~ 
mice. On P30, but not on PI 5, both the distribution and shape of 
peroxisomes differed between PJvk~^~ and P\vk^'^ mice (Fig- 
ure 5E). In P\vk^'^ OHCs, the peroxisomes were restricted to 
an area immediately below the cuticular plate. In P\vk~'~ mice. 



(J-L) Neuronal function rescue in P\v\c'~ mice by transduction with AAV8-Pjvk: effects on ABR interwave l-IV latency (J), on EEBR wave E IV amplitude and its 
hypervulnerability to electrical stimulation (K), and on EEBR interwave E ll-E IV latency (one example is shown in L, to be compared with H). Vertical arrows 
indicate the positions of waves I and IV on ABR traces and of waves E II and E IV on EEBR traces, ns, not significant; ***p < 0.001 . Error bars represent the SD. 
See also Figures SI and S2A. 
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Figure 3. Hypervulnerability to Sound in DFNB59 Patients 

(A) ABR waves I, III, and V (vertical arrows) in one ear of a patient carrying the 
PJVK p.T54l mutation, in response to 250, 500, and 1000 impulse stimuli 
(clicks) at 99 dB nHL. 

(B) Repeated ABRs after 10 min of silence, with an even larger vulnerability of 
waves I, III, and V. 

(C and D) Distributions of the amplitude (C) and latency (D) of ABR wave V in the 
tested sample of p.T54l patients (n = 8 ears), and in a control group of patients 



the peroxisomes located just below the cuticular plate were 
slightly larger than those in P\v\C'^ mice. Strikingly, irregular cata- 
lase-containing structures, some of which were juxtaposed, 
were present in the perinuclear region, in the immediate vicinity 
of the nuclear membrane of all P\vk~'~ OHCs, but not of 
P\v\C'^ OHCs (Figure 5E). The lack of pejvakin thus results in 
peroxisome abnormalities in OHCs after the onset of hearing. 

Pejvakin Is Involved in Oxidative Stress-Induced 
Peroxisome Proliferation 

In HepG2 cells, protrusions emerging from some peroxisomes, 
the first step of peroxisome biogenesis from pre-existing perox- 
isomes, were immunoreactive for pejvakin. String-of-beads 
structures corresponding to elongated and constricted peroxi- 
somes, preceding final fission (Smith and Aitchison, 2013), 
were also pejvakin-immunoreactive, suggesting a role for this 
protein in peroxisome proliferation (Figure S6C). Peroxisomes 
actively contribute to cellular redox balance, by producing and 
scavenging/degrading H 2 O 2 through a broad spectrum of oxi- 
dases and peroxidases (especially catalase), respectively 
(Schrader and Fahimi, 2006). Because P\vk~'~ mice displayed 
features of marked oxidative stress in the cochlea, we investi- 
gated the possible role of pejvakin in peroxisome proliferation 
in response to oxidative stress induced by H 2 O 2 (Lopez-Huertas 
et al., 2000). Embryonic fibroblasts derived from P\vk^'^ and 
P\yk~'~ mice were exposed to H 2 O 2 (see the Supplemental 
Experimental Procedures). In unexposed cells, the number of 
peroxisomes was similar between the two genotypes (t test, 
p = 0.82). After H 2 O 2 treatment, it increased by 46% in P\vk^'^ fi- 
broblasts (p = 0.004), but remained unchanged in P\vk~'~ fibro- 
blasts (p = 0.83), resulting in a statistically significant difference 
between the two genotypes (p < 0.001) (Figures 5C and S7A). 

We then asked whether mutations reported in DFNB59 pa- 
tients also affect peroxisome proliferation. We assessed the 
number of peroxisomes in transfected HeLa cells producing 
EGFP alone, EGFP and murine pejvakin, or EGFP and one of 
the mutated forms of murine pejvakin carrying the mutations 
responsible for DFNB59 (p.T54l, p.R183W, p.C343S, or 
p.V330Lfs*7). Cells producing the non-mutated pejvakin had 
larger numbers of peroxisomes than cells producing EGFP 
alone, whereas cells producing any of the mutated forms of pej- 
vakin (mutPjvk-IRES-EGFP) had smaller peroxisome numbers. 
In addition, many of these cells contained enlarged peroxi- 
somes, a feature typical of peroxisome proliferation disorders 
(Ebberink et al., 2012) (Figure 5D and S7B). Together, these re- 
sults strongly suggest that pejvakin is directly involved in the pro- 
duction of new peroxisomes from pre-existing peroxisomes. 

Upregulation of PJvk Cochlear Transcription and 
Peroxisome Proliferation in Response to Sound 

We then asked whether pejvakin is involved in the physiological 
response to sound. We first assessed the transcription of PJvk 



(n = 13) with cochlear hearing impairment and matched ABR threshoids, 
before and after exposure to ciicks #250 to #1 000. Boxes extend from the 25^^ 
to the 75^^ percentiie. Horizontai bars and verticai bars indicate median vaiues 
and extremes, respectiveiy. Uniike the unaffected controis, aii p.T54l patients 
dispiayed markediy decreased ampiitudes and increased iatencies. 
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and of CypA, Gpx2, c-Dct, and Mpv17, which were downregu- 
lated in P\vk~'~ mice, in microdissected organs of Corti from 
P21 wild-type mice, with or without prior sound stimulation (5- 
20 kHz, 1 05 dB SPL for 1 hr; see the Supplemental Experimental 
Procedures). Transcript levels were analyzed by qRT-PCR at 
various times (1,3,6, and 1 8 hr) after sound exposure (Figure 6A). 
PJvk transcript levels had increased by factors of 1.9 ± 0.1 and 

3.5 ± 0.7, mean ± SEM, after 1 and 6 hr, respectively. CypA, c- 
Dct, and Mpv17 were also upregulated after 6 hr (by factors of 

6.6 ± 1 .2, 4.3 ± 0.6, and 1 .5 ± 0.1 , respectively), as were c-Fos 
and Hsp70, used as a positive control, but not Gpx2. Thus, noise 
exposure leads to an upregulation of the transcription of PJvk and 
of genes downregulated in Pjvk~^~ mice, and this effect is depen- 
dent on acoustic energy level of the stimulation (Figure S4B). 

This result predicted that sound exposure would lead to 
peroxisome proliferation in the auditory system of wild-type 
mice. 6 hr after exposure (5-20 kHz, 105 dB SPL for 1 hr), 
the numbers of peroxisomes were unchanged (34.5 ± 0.8 and 
35.9 ± 1.0, mean ± SEM, per IHC from unexposed and sound- 
exposed mice, respectively, n = 75 cells from six mice; t test, 
p = 0.25). However, at 48 hr, they had markedly increased, by 
a factor of 2.3, in both IHCs and OHCs (84.7 ± 5.0 per IHC and 
16.5 ± 1.0 per OHC, n = 90 cells and n = 150 cells from six 
mice, respectively) compared to unexposed mice (36.8 ± 3.0 
per IHC and 7.3 ± 0.4 per OHC, n = 90 cells and n = 150 cells 
from six mice, respectively; t test, p < 0.0001 for both compari- 
sons). The number of peroxisomes had also increased, by 
35%, in the dendrites of primary auditory neurons (1.7 ± 0.1 
and 2.3 ± 0.2 peroxisomes per micrometer of neurite length, 
n = 40 neurites from five unexposed and five sound-exposed 
Pjvk^^^ mice, respectively; t test, p = 0.003) (Figure 6B). 

Therapeutic Approaches in PJvk~'~ Mice 

Based on these results, we tested whether the classical antiox- 
idant drug N-acetyl cysteine (NAC) (either alone or associated 
with a-lipoic acid and a-tocopherol; see the Supplemental 



Figure 4. Increased Oxidative Stress and 
ROS-Induced Cell Damage in the PJvk ' 
Cochlea 

(A) Reduced glutathione (GSFI) (left bar chart), 
oxidized-glutathione (GSSG) (middle bar chart) 
contents, and GSH:GSSG ratio (right bar chart) in 
P21 P\vk~'~ versus PJvk^'^ cochlea. Error bars 
represent the SEM of three independent experi- 
ments. See also Figure S3. 

(B) Marked decrease in the BK a-subunit im- 
munolabeling \nPjvk~^~ IHCs. Left: P20 Pjvk^^'^ and 
PJvk~'~ IHCs. Scale bar is 5 |am. Right: quantitative 
analysis of BK channel clusters. Error bars repre- 
sent the SD. See also Figure S5B. *p < 0.05, ***p < 
0 . 001 . 

See also Figures S3 and S5. 



Experimental Procedures) administered 
to P\vk~'~ pups could improve their hear- 
ing. The ABR thresholds of P21 NAC- 
treated P\vk~'~ pups (n = 21) were about 
10 dB lower than those of untreated 
24) for all frequencies tested (t test, p < 



PJvk~'~ pups (n 
0.001 for all comparisons) (Figure 7A). The amplitude of the 
ABR wave I elicited at 105 dB SPL (4.35 ± 1 .16 |iV, n = 21) was 
the same as that of PJvk^'^ mice (4.36 ± 1 .15 |iV, n = 18; t test, 
p = 0.97) and greater than that of untreated PJvk~'~ mice (1 .88 
± 1.07 |iV, n = 24; t test, p < 0.001) (Figure 7B). EEBRs were 
more resistant to the high-rate electrical stimulation in treated 
than in untreated mutant mice (Figure 7C). Conversely, NAC 
had no beneficial effect on OHCs (data not shown). The associ- 
ation of NAC with a-lipoic acid and a-tocopherol did not perform 
any better (data not shown). 

Full recovery of the neuronal phenotype was achieved by the 
intracochlear injection of AAV8-Pjvk (see above). As hair cells 
are not transduced by AAV8, we investigated whether AAV2/8, 
which transduces hair cells only (Figure S2B), could rescue 
the P\vk~'~ hair-cell phenotype. The auditory function of P\vk~'~ 
mice (n = 7, four pups per cage in every experiment) receiving 
intracochlear injections of AAV2/8-Pjvk-IRES-EGFP on P3 was 
assessed on P21, and the percentage of transduced IHCs 
and OHCs was evaluated in each injected and contralateral 
(not injected) cochlea, on the basis of EGFP fluorescence. 
Improvements in ABR thresholds of 20 to 30 dB SPL with respect 
to untreated mice were observed for frequencies between 10 
and 20 kHz (t test, p < 0.001 for all comparisons; Figure 7D). 
Upon injection of AAV2/8-EGFP, DPOAEs, ABR thresholds, 
and ABR wave I amplitude and latency were similar to those of 
untreated P\vk~'~ mice (data not shown). A partial reversion of 
the OHC dysfunction was obtained, with detectable DPOAEs 
in pejvakin cDNA-treated cochleas (threshold 54.0 ± 10.7 dB), 
but not in contralateral, untreated cochleas (Figure 7E). DPOAE 
thresholds were linearly correlated (r^ = 0.74, p < 0.001) with 
the number of EGFP-tagged OHCs (Figure 7F), suggesting 
that the normalization of DPOAE thresholds may be possible if 
all OHCs could be transduced. The latency of the ABR wave I 
in response to a 105 dB SPL stimulation decreased significantly 
(1 .38 ± 0.1 1 ms for the treated ears; n = 6, versus 1 .53 ± 0.1 0 ms 
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Figure 5. Pejvakin Is a Peroxisome-Associated Protein Involved in the Oxidative Stress-Induced Peroxisomal Proliferation 

(A and B) Immunolabeling of PMP70 and endogenous pejvakin in a HepG2 ceii (A) and in two P20 P\vk^'^ iHCs (B). See aiso Figure S6B. 

(C) Number of peroxisomes in P\vk^'^ and P\vk~'~ mouse embryonic fibrobiasts (MEFs) subjected to 0.5 mM H 2 O 2 versus untreated MEFs (n = 30 ceiis for each 
condition). See aiso Figure S7A. 

(D) Untransfected HeLa ceiis (NT) and transfected ceiis producing either EGFP aione or EGFP, together with the wiid-type pejvakin (Pjvk) or a mutated Pjvk 
(p.T54i, P.R183W, P.C343S, or p.V330L/s*7). Left panei: bar chart showing the numbers of peroxisome per ceii 48 hr after transfection. There were on average 

(legend continued on next page) 
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Figure 6. Effect of Exposure to Loud Sounds 
on the Cochlear Expression of Pjvk and the 
Number of Peroxisomes in Cochlear Hair 
Cells and Ganglion Neurons 

(A) Pjvk, c-Dct, CypA, Mpv17, and Gpx2 transcript 
levels assessed by qRT-PCR in P21 Pjvk^'^ organ 
of Corti 1, 3, 6, and 18 hr after sound exposure 
(5-20 kHz, 1 05 dB SPL for 1 hr). The levels of c-Fos 
and Hsp70 transcripts were used as positive con- 
trols. See also Figure S4B. 

(B) Peroxisome proliferation in P21 Pjvk^'^ hair 
cells and cochlear ganglion neurons after sound 
exposure (same conditions as in A). Peroxisomes 
were counted 48 hr after sound exposure. OHCs, 
IHCs, and neuronal processes stained for F-actin, 
myosin VI, and neurofilament protein NF200, 
respectively. In OHCs and IHCs, the peroxisomes 
are located below the CP and throughout the 
cytoplasm, respectively. For OHCs, both a lateral 
view and a transverse optical section at the level of 
CP (scheme on the right) are shown. The number of 
peroxisomes was increased in OHCs, IHCs, and 
dendrites after sound exposure. N, cell nucleus. 
***p < 0.001. Error bars represent the SEM. Scale 
bars are 5 i^m. 

See also Figure S4 and Table SI . 
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Before sound exposure, the numbers 
of peroxisomes in IHCs of P21 P\vk~'~ 
and AAV2/8-Pjvk-IRES-EGFP-injected 
P\v\c'~ mice did not differ from that of 
Pjvk^^^ mice (30.5 ± 1.9, 32.3 ± 2.1, and 
36.8 ± 3.0 peroxisomes, mean ± SEM 
per IHC, n = 60 cells from four Pjvk~^~ 
and four AAV2/8-Pjvk Pjvk~^~ mice, and 
n = 90 cells from six Pjvk'^'^ mice, respec- 
tively; t test, p = 0.1 1 and p = 0.30, respec- 
tively). By contrast, 48 hr after sound 
exposure (5-20 kHz) at 105 dB SPL for 
1 hr, the number of peroxisomes had 
decreased by 63% in Pjvk~^~ IHCs (30.5 ± 1 .9 and 1 1 .2 ± 1 .3 
peroxisomes per IHC, n = 75 cells from five unexposed and 
five sound-exposed Pjvk~^~ mice, respectively; t test, p < 
0.0001), and enlarged PMP70-labeled structures were pre- 
sent close to the nucleus (Figure 7J). In response to the 
same sound but of a lower intensity, i.e., 97 dB SPL for 1 hr, 
the number of peroxisomes was unchanged in Pjvk~^~ IHCs 
(30.5 ± 1.9 and 34.6 ± 2.3 peroxisomes per IHC, n = 60 cells 
PyV/c“^“ IHCs by AAV2/8-Pjvk-IRES-EGFP on their peroxisomes, from four unexposed and four sound-exposed Pjvk~^~ mice. 



for the contralateral, untreated ears; paired t test, p = 0.03) 
(Figure 7G), and its amplitude increased into the normal range 
(7.34 ± 0.80 |iV versus 2.93 ± 0.92 |iV; paired t test, p < 0.001) 
(Figure 7H), in relation to the number of EGFP-tagged IHCs 
(r^ = 0.89 for wave I amplitude, p < 0.001 ; Figure 71). No correc- 
tion of the interwave l-IV latency was observed, as expected 
(data not shown). 

Finally, we investigated the effect of the transduction of 



33% more peroxisomes in ceiis producing both EGFP and Pjvk (n = 200) than in ceiis producing EGFP aione (n = 150). Right panei: for every range of eniarged 
peroxisome size, x (0.6-0. 8 |am, 0.8-1 .0 ^irn, and >1 .0 |am), in two perpendicuiar directions, the proportion of ceiis containing at ieast one peroxisome. See aiso 
Figure S7B. 

(E) Abnormaiities in shape and distribution of peroxisomes in mature Pjvk~'~ OHCs detected by TEM (P30 Pjvk~'~ [middie and right] and Pjvk'^'^ [ieft] OHCs). 
Insets [middle panel]: enlarged views of individual peroxisomes. In Pjvk'^'^ OHCs, peroxisomes are grouped just under the cuticular plate (CP) (arrowheads), with 
none detected in the perinuclear region (n = 33 sections, upper bar chart). In Pjvk~'~ OHCs, some peroxisomes remain under the CP (arrowheads), but catalase- 
containing structures, misshapen peroxisomes (arrows), are detected in the perinuclear region (n = 24 sections, upper bar chart). Peroxisomes located under the 
CP are larger in Pjvk~^~ OHCs (n = 92 peroxisomes) than in Pjvk^'^ OHCs (n = 89 peroxisomes) (lower bar chart). N, cell nucleus. **p < 0.01 , ***p < 0.001 . Error bars 
represent the SEM. Scale bars are 5 i^m in (A) and (B) and 0.5 i^m in (E). 

See also Figures S6 and S7. 
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Figure 7. Therapeutic Approaches in P\vk ' Mice 

(A-C) Effect of N-acetyl cysteine (NAG) on auditory function in P\v\c'~ mice. (A) ABR threshoids in untreated versus NAC-treated P21 P\vlc'~ mice. (B) ABR wave I 
ampiitude for 1 0 kHz tone bursts in P\vk^'^ and untreated P\vk~'~ versus NAC-treated P\vk~'~ mice at P21 . (C) EEBR wave E iV ampiitude before (dots) and after 
(crosses) controiied eiectricai stimulation of the cochlear nerve at 200 impulses/s for 1 min in P\vk^'^, untreated Pjvk~'~ , and NAC-treated P\vk~'~ mice. 

(D-l) Effect of AAV2/8-Pjvk-IRES-EGFP transferred into the cochlear hair cells on the auditory function oiPjvk~^~ mice. See also Figure S2B. (D) ABR thresholds at 
10, 15, and 20 kHz in AAV2/8-Pjvk-IRES-EGFP-treated versus untreated P\vk~'~ mice. (E and H) DPOAE threshold (E) and ABR wave I amplitude (H) at 10 kHz in 

(legend continued on next page) 
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respectively; t test, p = 0.17), and no enlarged PMP70-stained 
structures were detected (data not shown). The absence of pej- 
vakin thus resulted in defective sound-induced peroxisomal pro- 
liferation (both at 105 dB SPL and 97 dB SPL) and, even, in 
peroxisome degeneration (at 105 dB SPL) in IHCs. In Pjvk~^~ 
mice injected with AAV2/8-Pjvk-IRES-EGFP on P3 and exposed 
to 105 dB SPL for 1 hr on P21, enlarged PMP70-labeled struc- 
tures were no longer detected in transduced IHCs, and the num- 
ber of peroxisomes increased by 35% (32.3 ± 2.1 and 43.7 ± 3.0 
peroxisomes per IHC, n = 60 cells from unexposed and exposed 
transduced Pjvk~^~ IHCs, respectively; t test, p = 0.002) (Fig- 
ure 7J). We conclude that pejvakin re-expression fully protects 
Pjvk~^~ IHCs from the degenerescence of peroxisomes and 
partially restores their impaired adaptive proliferation. 

DISCUSSION 

Noise-induced hearing loss (NIHL) is the second most common 
form of sensorineural hearing impairment after presbycusis in 
the United States (Dobie, 2008). Here, we describe a genetic 
form of NIHL, by showing that pejvakin deficiency in mice and 
DFNB59 patients leads to hypervulnerability to sound, due to a 
peroxisomal deficiency. To our knowledge, a peroxisomal cause 
of an isolated (non-syndromic) form of inherited deafness has not 
been reported yet. The peroxisome emerges as a key organelle in 
the redox homeostasis of the auditory system, for coping with the 
overproduction of ROS induced by high levels of acoustic energy. 

Acoustic energy is the main determinant of NIHL. The Lex,s hr 
(for level of exposure over an 8-hr workshift) index has been 
defined such that an Lex,s hr of X dB delivers the same energy 
as a stable sound of X dB played over a period of 8 hr. Chronic 
occupational exposures to less than 85 dB (or 80 dB, depending 
on the country) are deemed safe. In Pjvk~^~ mice, a single expo- 
sure to 63 dB Lex, 8 hr increased hearing thresholds by 30 dB, with 
full recovery occurring after about 2 weeks. By contrast, a ten 
times more energetic exposure to a Lex,s hr of 73 dB in wild- 
type mice of the same strain produces only an 18 dB shift in 
threshold, with a recovery time of 12 hr (Housley et al., 2013). 
This hypersensitivity of Pjvk~^~ mice to noise suggests that the 
Lex, 8 hr of about 83 dB for a cage of ten pups is sufficient to ac- 
count for permanent hearing loss in these Pjvk~^~ pups, while 
some of those housed in small numbers in quiet rooms can 
display near-normal hearing thresholds (see Figure 1C). Likewise, 
the auditory function of DFNB59 patients was transiently affected 
by a 57 dB Lex, 8 hr exposure, routinely used in ABR tests. 

NIHL involves the excessive production of ROS, overwhelming 
the antioxidant defense system and causing irreversible oxidative 
damage to DNA, proteins, and lipids within the cell (Henderson 
et al., 2006). Noise-induced oxidative stress results in the produc- 
tion of H 2 O 2 and other ROS as by-products, thought to derive 
from the intense solicitation of mitochondrial activity, and several 



mouse mutants with mitochondrial defects are prone to NIHL (Oh- 
lemiller et al., 1999; Brown et al., 2014). Our studies of pejvakin- 
deficient mouse mutants and rescue experiments targeting the 
hair cells and auditory neurons unambiguously show that IHCs, 
OHCs, primary auditory neurons, and neurons of the cochlear nu- 
cleus are hypervulnerable to sound in the absence of pejvakin, 
which is consistent with previous results showing that hair cells 
and neurons of the auditory system are targets of NIHL (Wang 
et al., 2002; Kujawa and Liberman, 2009; Imig and Durham, 
2005). However, our study goes one step further by implicating 
a possible common mechanism: peroxisomal failure, the impor- 
tance of which is demonstrated by the impairment of the redox 
homeostasis caused by pejvakin deficiency. It also reveals a ma- 
jor cause of the unusually high level of phenotypic variability 
observed in pejvakin-deficient mice and humans: the difference 
in sound exposure and the inability of the peroxisomes to cope 
with the resultant activity-dependent oxidative stress in the 
absence of pejvakin. Incidentally, this can account for the 
apparent paradox that mice carrying the R183W mutation in pej- 
vakin displayed a much more severe neural pathway defect than 
the PJvk~^~ mice (Delmaghani et al., 2006). Due to the preserva- 
tion of hair cell functions, the auditory neurons of R183W mutant 
mice should be strongly stimulated, whereas the early permanent 
damage to cochlear hair cells in Pjvk~^~ mice acts as a protective 
“muffler” of the neuronal pathway. 

In mammals, the number and metabolic functions of peroxi- 
somes differ between cell types. However, all cell types are 
able to adapt rapidly to modifications in physiological conditions 
by changing the number, shape, size, and molecular content of 
peroxisomes, resulting in considerable functional plasticity of 
these organelles (Schrader et al., 2012; Smith and Aitchison, 
201 3). Our experiments on Pjvk~^~ and Pjvk'^^^ mouse embryonic 
fibroblasts stressed with H 2 O 2 showed that pejvakin is critically 
involved in the oxidative stress-induced proliferation of peroxi- 
somes through growth and fission of pre-existing peroxisomes. 
The molecular machinery underlying this adaptive process is still 
poorly understood beyond the involvement of Pexlla (Li et al., 
2002). Of note, the absence of pejvakin only affects the prolifer- 
ation of peroxisomes from pre-existing peroxisomes, but not the 
constitutive biogenesis of this organelle. Accordingly, structural 
abnormalities of peroxisomes in Pjvk~^~ mice became apparent 
only after hearing onset, in the context of the oxidative stress 
produced by noise exposure. By contrast, the PEX gene defects 
causing Zellweger syndrome spectrum (ZSS) disorders (Water- 
ham and Ebberink, 2012) and rhizomelic chondrodysplasia 
punctata affect the constitutive biogenesis of peroxisomes. 
Hearing impairment in ZSS disorders involves a severe impair- 
ment of neuronal conduction and has been attributed to defects 
in the synthesis of two essential myelin sheath components— 
plasmalogens and docosahexaenoic acid— which is critically 
dependent on peroxisomes. Our results suggest that ZSS also 



treated versus untreated contralateral ears. (G) ABR wave I latency in treated versus untreated contralateral ears. (F) Correlation between DPOAE thresholds and 
the proportion of EGFP-tagged (i.e., transduced) OHCs. Six untreated ears have no recordable DPOAE (threshold arbitrarily set at 80 dB SPL; red diamond). (I) 
Correlation between ABR wave I amplitude at 10 kHz, 105 dB SPL, and the percentage of transduced IHCs (EGFP tagged). 

(J) Effect of AAV2/8-Pjvk-IRES-EGFP on the peroxisomes in Pjv\c'~ IHCs. Upper and lower panels show and quantify (bar charts) the peroxisomes in untreated 
mice 48 hr after sound exposure (5-20 kHz, 1 05 dB SPL for 1 hr) (peroxisome abnormalities are indicated by arrowheads). Error bars represent the SD in (A-l) and 
the SEM in (J). ns, not significant; *p < 0.05, **p < 0.01 , ***p < 0.001 . 
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includes a defective redox balance in the hair cells and neurons 
of the auditory system. 

In the context of noise exposure, the upregulation of PJvk tran- 
scription in the cochlea and the subsequent peroxisome prolifer- 
ation in the hair cells and auditory neurons of wild-type mice 
suggest that pejvakin-dependent peroxisome proliferation in 
the auditory system is part of the physiological response to 
high levels of acoustic energy that result in increased amounts 
of ROS. This and the marked oxidative stress detected in the 
P\vk~'~ cochlea imply that the proliferation of peroxisomes plays 
an antioxidant role, similar to that reported in other cell types 
(Santos et al., 2005; Diano et al., 2011). The rapid elevation of 
the hearing threshold in P\vk~'~ mice in response to low-energy 
sounds and the increase in interwave l-IV latency observed in 
DFNB59 patients within a few seconds are consistent with an ac- 
tivity-dependent H 2 O 2 production that, due to impaired cellular 
redox homeostasis, results in concentrations of H 2 O 2 high 
enough to impact on the activity of various target proteins 
including ion channels and transporters (Rice, 2011). The wors- 
ening of hearing sensitivity, 2 days later, in the mutant mice lack- 
ing pejvakin, exacerbated by putting back the mice in a noisy 
environment, fits the picture of the absence of sound-induced 
biogenesis of peroxisomes (with their degeneration occurring 
in a high acoustic energy environment). We thus conclude that 
the hypervulnerability of P'\vk~'~ mice and DFNB59 patients to 
sound does not result simply from an exacerbation, by sound, 
of a pre-existing redox-balance defect, but is the consequence 
of impaired adaptive proliferation of peroxisomes in the absence 
of pejvakin. Both defective peroxisome proliferation in IHCs of 
P\yk~'~ mice in response to sound exposure and its partial re- 
covery by pejvakin cDNA transfer support this conclusion. A 
full recovery of the adaptive peroxisome proliferation produced 
by sound exposure may require higher concentrations of pejva- 
kin or the sound-induced modulation of PJvk transcription (see 
Figure 6A), which was missing in our rescue experiments (pejva- 
kin cDNA expression being driven by a constitutive promoter). 

In patients with hearing impairment, the amplification of sound 
by hearing aids or direct electrical stimulation of the auditory 
nerve by a cochlear implant delivers a stimulus with an energy 
level similar to that shown here to worsen the hearing impairment 
of P\vk~'~ mice within 1 min of sound exposure. Therefore, in 
cases of peroxisomal deficiency, as in DFNB59, specific protec- 
tion against redox homeostasis failure is essential. Patients with 
such conditions should avoid noisy environments and a benefi- 
cial effect of hearing devices should require an antioxidant 
protection. N-acetyl cysteine was the only antioxidant drug 
tested here to display some, albeit limited, efficacy. By contrast, 
AAV-mediated gene therapy could potentially provide full pro- 
tection. Finally, deciphering the sound-stress-induced protec- 
tive signaling pathway involving pejvakin might lead to the 
discovery of therapeutic agents for NIHL. 

EXPERIMENTAL PROCEDURES 
Audiological Studies in Mice 

Auditory tests were performed in an anechoic room, on anesthetized animais 
whose core temperatures were maintained at 37°C (see the Suppiementai 
Experimentai Procedures). 



Audioiogicai Tests in Patients 

informed consent was obtained from aii the subjects inciuded in the study. 
Pure-tone audiometry was performed with air- and bone-transmitted tones. 
Hearing impairment was assessed objectiveiy, by measuring ABRs and 
transient-evoked otoacoustic emissions (TEOAEs). The noniinear TEOAE 
recording procedure was used (derived from the iL088 system), making it 
possibie to extract TEOAEs from iinear reflection artifacts from the middle 
ear, and to evaluate background noise. TEOAE responses were analyzed in 
1 -kHz-wide bands centered on 1 , 2, 3 and 4 kHz. 

Generation of an Anti-pejvakin Monocionai Antibody 

The 3' end of the coding sequence of the Pjvk cDNA (NCBI:NM_001 08071 1 .2) 
was inserted into a pGST-parallel-2 vector (derived from pGEX-4T-1; Amer- 
sham). The resultant construct, encoding the C-terminal region of pejvakin 
(residues 290-352; RefSeq:NP_001 0741 80.1) fused to an N-terminal gluta- 
thione S-transferase tag, was introduced into Escherichia coii BL21-Gold 
(DE3)-competent cells (Stratagene). The pejvakin protein fragment was puri- 
fied on a glutathione-Sepharose 4B column, then subjected to size-exclusion 
chromatography and used as the antigen for immunization. Antibodies were 
produced by immunizing PJvk~'~ mice. An immunoglobulin G monoclonal anti- 
body (Kd of 6 X 10“® M), Pjvk-G21, was selected by ELISA on immunogen- 
coated plates. 

Statistical Analyses 

Quantitative data are presented as mean + SD, unless otherwise mentioned. 
Statistical analyses were performed using GraphPad. Data were analyzed by 
paired or unpaired Student’s t tests and, for multiple comparisons, either by 
one-way or two-way ANOVA or by t tests with the Bonferroni correction. Sta- 
tistical significance of the differences observed between groups is defined as 
p < 0.05. 
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SUMMARY 

ESCRT-III is required for lipid membrane remodeling 
in many cellular processes, from abscission to viral 
budding and multi-vesicular body biogenesis. How- 
ever, how ESCRT-III polymerization generates mem- 
brane curvature remains debated. Here, we show 
that Snf7, the main component of ESCRT-III, poly- 
merizes into spirals at the surface of lipid bilayers. 
When covering the entire membrane surface, these 
spirals stopped growing when densely packed: 
they had a polygonal shape, suggesting that lateral 
compression could deform them. We reasoned that 
Snf7 spirals could function as spiral springs. By 
measuring the polymerization energy and the rigidity 
of Snf7 filaments, we showed that they were 
deformed while growing in a confined area. Further- 
more, we observed that the elastic expansion of 
compressed Snf7 spirals generated an area differ- 
ence between the two sides of the membrane and 
thus curvature. This spring-like activity underlies 
the driving force by which ESCRT-III could mediate 
membrane deformation and fission. 

INTRODUCTION 

ESCRT-III (endosomal sorting complex required for transport) 
has been implicated in the formation of intralumenal vesicles 
(ILVs) during biogenesis of multi-vesicular bodies (MVBs) by ge- 
netic (Babst et al., 2002; Coonrod and Stevens, 2010) and 
biochemical assays (Adell et al., 2014; Henne et al., 2012; Sak- 
sena et al., 2009; Wollert and Hurley, 2010; Wollert et al., 2009). 
ESCRT-III budding occurs in an opposite direction than in endo- 
cytosis: the limiting membrane is pushed outward from the cyto- 
plasm instead of curving inward. ESCRT-III has been proposed to 
play a role in membrane deformation (Hanson et al., 2008) and 
fission of ILVs (Adell et al., 2014). Consistent with this, ESCRT- 
III is also required for geometrically similar fission reactions 
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such as viral budding (von Schwedler et al., 2003) and abscission 
during cytokinesis (Carlton et al., 2008; Elia et al., 201 1 ; Guizetti 
et al., 2011). ESCRT-III nucleation is promoted by ESCRT-II and 
its disassembly by the ATPase Vps4 (Lata et al., 2008). 

It is unclear how ESCRT-III deforms lipid membranes. 
Because of their polymerization abilities, ESCRT-III proteins 
(Vps20, Snf7, Vps2, Vps24) have been proposed to generate 
membrane curvature by scaffolding (Cashikar et al., 2014; Fab- 
rikant et al., 2009; Hanson et al., 2008; Lata et al., 2008). In this 
mode, polymers coating the membrane usually adopt a single 
specific shape, or, at least, a set of geometrically similar shapes. 
ESCRT-III filaments adopt instead a wide variety of shapes in vivo 
and in vitro: concentric circles, rings, spirals, helices, or linear fil- 
aments have been observed (Hanson et al., 2008; Henne et al., 
2012; Pires et al., 2009). Furthermore, no unique shape for the 
assembly of ESCRT-III proteins arises from the molecular struc- 
ture of ESCRT-III proteins (McCullough et al., 201 3). Instead, cur- 
vature could be generated by other mechanisms: for example, it 
has been proposed that the amphipathic insertion of the N-termi- 
nal part of Snf7 could participate in the generation of membrane 
curvature (Buchkovich et al., 2013). We were thus interested in 
studying how ESCRT-III polymerization could drive membrane 
curvature. 

RESULTS 

Growth of Snf7 Patches on Supported Bilayers 

To study the polymerization of ESCRT-III, we reconstituted 
ESCRT-III polymerization by adding purified yeast Snf7 onto 
supported lipid membranes. Supported membranes were ob- 
tained by bursting giant unilamellar vesicles (GUVs) composed 
of 40% di-oleoyl-phosphatidylserine (DOPS) and 60% of di- 
oleoyl-phosphatidylcholine (DOPC) on cleaned glass coverslips 
(Figure SI A, Movie SI). These coverslips were built into a flow 
chamber, allowing sequential addition and exchange of 
solutions. 

First, Snf7 labeled with Alexa488 (Snf7-Alexa488) was flushed 
into the chamber, and its association to the membrane was 
imaged by time-lapse spinning-disk confocal microscopy 
(SDC). At 400 nM, Snf7 formed patches evenly distributed on 
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Figure 1. Nucleation and Growth of Snf7 Patches on Supported Membranes 

Lipid composition is DOPC 60% / DOPS 40%+ Rhodamine PE 0.1 %. 

(A) Time-iapse images of Snf7-Aiexa488 patches growth (green) at [Snf7] = 400 nM on supported membrane (gray). 

(B) Time-iapse images (every 10 min) of a singie Snf7-Aiexa488 patch (green) growing at [Snf7] = 200 nM. 



(legend continued on next page) 
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the membrane surface (Figure 1A). These circular patches (Fig- 
ure 1 B) grew over the course of an hour until becoming confluent, 
eventually covering the membrane completely. 

The nucleation rate of the patches depended on Snf7 bulk 
concentration (Figure 1C). Patch formation was not observed 
below 200 nM. Above 1 |iM, patch formation and growth was 
so fast that individual patches could hardly be discriminated. Be- 
tween these limiting concentrations, we were able to follow 
micron-sized patches individually over several tens of minutes 
(Movie S2). Once formed, patches disassembled with a half 
time of approximately 15 hr upon Snf7 washout (Figure SIB; 
Movie S3). 

We termed “patch nucleation” this nucleation of patches in 
the absence of a previous Snf7 structure. The patch nucleation 
rate was very low, less than 1 seed.|im“^.hour“\ and depended 
on the amount of negatively charged lipids in the membrane (Fig- 
ure SIC), revealing the critical role of these lipids in promoting 
Snf7 polymerization. We did not observe Snf7 assemblies in 
the absence of membranes. 

The periphery of the patches showed dimmer Snf7 fluores- 
cence than the center: fluorescence decayed radially at the rim 
over the outer 4 iim. This gradient of fluorescence was the 
same as the patch grew in radius (Figure ID) and independent 
on bulk concentration of Snf7 (Figure IE). The central part of 
the patches had the same intensity, constant over time. We 
postulated that the patch could be made of two parts: a central 
part where Snf7 entirely covers the membrane and cannot further 
assemble, and a rim, representing a growing front. To study 
whether Snf7 was assembled only in the front region, we gener- 
ated patches with a solution of Snf7-Alexa488, which we then re- 
placed with Snf7-Atto647N (Figure 1 F and Movie S4). As postu- 
lated, Snf7-Atto647N fluorescence appeared only at the border 
of the growing patches. These observations confirmed that 
Snf7 patches were growing by a traveling circular front. 

The front propagated at constant speed (Figure 1G). The 
speed was linear with Snf7 concentration (Figure 1 H). As a result, 
and because the fluorescence gradient was independent from 
Snf7 concentration, fluorescence intensity curves with time at 
a given point (Figure S1E) could be merged into a single one 
by rescaling time with Snf7 concentration (Figure S1F). The 
amount of negatively charged lipids also affected the front speed 
(Figure SI D). In summary, the growth of the Snf7 patches re- 
flected a nucleation/growth process, with a nucleation rate of 
less than 1 seed.|im“^.hour“^ and a radial growth speed of 
760 nm.min“''.|iM“V 

Since Snf7 filaments can curl into circles or spirals (Hanson 
et al., 2008; Henne et al., 2012; Shen et al., 2014), we reasoned 
that one patch could be made of a single spiral filament growing 
from its tips at constant rate. In this case, however, the radial 



growth speed of the patch should slow down as the square 
root of time, and a dimmer fluorescence at the periphery of the 
patches would not be expected. To resolve this apparent contra- 
diction, we studied the molecular structure of the Snf7 patches 
with atomic force microscopy (AFM) and electron microscopy 
(EM). 

Snf7 Patches Are Made of Spiraling Filaments with 
Lateral Interactions 

We first acquired images of Snf7 patches by AFM. GUVs 
composed of 60% DOPC and 40% DOPS were burst on a 
mica support (Figure S2A). After a four hours incubation of 
Snf7 at 1 |iM, AFM images revealed that the micron-sized Snf7 
patches consisted of packed arrays of Snf7 circular assemblies 
(Figure 2A; Figure S2B). Each assembly was formed by concen- 
tric circle-like structures. However, in these packed conditions, 
rather than being perfectly circular, each assembly was 
deformed into polygons with six neighbors on average (Fig- 
ure 2B). The average external radius was 123 + 35 nm (in the 
following, values are mean ± SD unless otherwise noted; n = 
295) (Figure 2C), and the innermost circle had an average radius 
of 1 8 ± 3 nm (n = 1 20) (Figure 2D). The average distance between 
successive circles was b = 1 7 ± 3 nm (n = 80) (Figure 2E). 

To study the structure of these Snf7 assemblies, we performed 
negative stain electron microscopy (EM) of large unilamellar ves- 
icles (LUVs) coated with Snf7 upon incubation for 15 min in a 
1 |iM Snf7 solution. LUVs coated with circular structures were 
observed, consistent with AFM images (Figure 2F). The fine 
structure of the filaments remained difficult to see because the 
two hemispheres of the LUV are projected onto the same EM im- 
age. However, in many cases, LUVs that had adhered on the grid 
were flushed during staining, leaving Snf7 assemblies attached 
to the grid surface. Two kinds of structures were then observed: 
small rings (27 ± 4 nm average radius; n = 61 ; Figure 2G) and 
large circular assemblies (R = 11 0 ± 40 nm; n = 46; Figure 2H). 

Small rings are composed of filaments with two different thick- 
nesses (Figure 2G and Figure S2C). The thinner ones appeared 
single stranded with an approximate thickness of 4.5 ± 0.3 nm 
(n = 10) in agreement with previous data (Pires et al., 2009; 
Shen et al., 2014). The thicker ones were double-stranded, 
with approximately twice the thickness (10.7 ± 0.7 nm; n = 7). 
The larger circular assemblies had an average radius of 110 ± 
40 nm (n = 46). Following the path of the innermost filament in 
these assemblies revealed that they were made of a single fila- 
ment, self-organized into a spiral (Figure 2H; Figure S2D). Within 
the spiral, we observed that the filament could associate laterally 
with itself, forming double-stranded filaments. These images 
showed that the spiraling nature of Snf7 filaments previously 
observed in solution (Shen et al., 2014) is conserved at the 



(C) Patch nucleation rate as a function of [Snf7]. 

(D) Successive (from bright to dark green, every 10 min) Snf7 patch fluorescence profiles (circularly averaged) at [Snf7] = 200 nM. 

(E) Snf7 patch edge fluorescence profile (average of 3 patches) as a function of [Snf7] (data for [Snf7] < 200 nM were obtained by first nucleating the patches at 
350 nM for 5 min, and then [Snf7] was reduced to the desired value). 

(F) Exchange of bulk Snf7-Alexa488 (green) with Snf7-Atto647N (red) at 200 nM. Inset: kymograph of the region selected (yellow box). The green line is the switch 
between green and red Snf7. 

(G) Equatorial kymograph of the patch shown in B. 

(H) Patch radial growth speed as a function of [Snf7]. The slope of the linear fit (gray line) is 760 nm.min“L|aM“L 
See also Figure SI . 
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Figure 2. Patches Are Made of Packed Snf7 Spirals 

(A) AFM topographic image of the center of a Snf7 patch. 

(B) Histogram of the number of neighbors per assembiy. 

(C) Histogram of the average outer radii of Snf7 assembiies. The average radius is 123 nm ± 33 nm. 

(D) Histogram of the innermost circie radii. The average radius is 18 nm ± 3 nm. 

(E) Histogram of the inter-circie distance. The average distance is b = 17 nm ± 3 nm. 

(F) TEM image of a negativeiy stained, Snf7-coated LUV. 



(legend continued on next page) 
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membrane. It also suggests that previously reported circular as- 
semblies could also be spirals: in vitro, a Snf7 mutant protein that 
polymerizes spontaneously onto membranes (Henne et al., 
2012) or, in fibroblasts, the circular structures found upon over- 
expression of CHMP4A and CHMP4B (Hanson et al., 2008). The 
formation of rings suggests that Snf7 filaments grow with a 
preferred curvature and the formation of spiral suggests that 
Snf7 filaments can depart from this preferred curvature with 
some flexibility. 

The AFM images also revealed that filaments could split within 
the disk assemblies (Figure 2I): thick filaments that appeared as 
concentric circles are actually interconnected by thinner fila- 
ments (Figure 2J, arrows). This is consistent with the spiral struc- 
ture seen by EM: double-stranded parts are combined with sin- 
gle-stranded connections (Figure 2H). The thinner filaments had 
an average thickness of 4.9 nm ± 1 .5 nm (n = 25), probably cor- 
responding to single strands, whereas thicker ones had a thick- 
ness of 10.6 ± 1.2 nm (n = 25), consistent with double-strands. 
The AFM and EM analysis suggested a structure of the Snf7 as- 
semblies, where a single spiral filament interacted laterally with 
itself to form double-stranded filaments. Occasionally, spirals 
were directly observed by AFM (Figure 21, subpanel 6). 

These observations prompted the question of how Snf7 
patches were formed. Patch nucleation could, for instance, start 
from a single closed ring, like those seen by EM. It is conceivable 
that such rings could be prone to break open, thus freeing fila- 
ment tips that could further grow into a spiral. How would then 
this initial spiral transform into a patch? A possible scenario 
consists of a two-step growth mechanism (Figure 3A). First, 
new spirals are nucleated in the vicinity of existing spirals 
(termed below spiral nucleation). Rupture of filaments would 
separate the newly formed spirals from the initial spiral. Second, 
these spirals would grow independently through the addition of 
monomers at their filament tips. This scenario accounts for the 
observed growth dynamics of Snf7 patches: the constant speed 
of the radial growth of the patches implies that the density of 
growing filament tips at their rim stays constant. The formation 
of new spirals generates new tips, maintaining a constant density 
of growing tips. 

To explore quantitatively the implications of such scenario, we 
developed a mathematical description of the dynamics of sur- 
face coverage by growing Snf7 spirals (Figure 3B and Figure S3). 
In the model, Snf7 spirals are represented by hard disks depos- 
ited on a surface representing a small (micron-sized) piece of 
membrane. As initial conditions, a few disks with radius Tq are 
present, corresponding to events of initial patch nucleation. 
Patch nucleation is then neglected (set to zero) during the rest 
of the dynamics. New disks are thus generated only by spiral 
nucleation. 

Disks growth corresponds to an area gain w per unit time 
(in nm^.s“^), as expected if the Snf7 filaments are elongating 



from their tips at a constant speed w/b (in nm.s“\ where b = 
17 nm is the distance between Snf7 filaments in a single spiral). 
As the Snf7 spiral grows, its perimeter increases, offering an 
increasing number of potential spiral nucleation sites. We model 
this by stochastic nucleation of new disks with radius Tq at a rate 
where ^ denotes the total perimeter of the existing disks, 
and A a constant spiral nucleation rate (expressed in number of 
nucleation events per second per micrometer). In the model, 
nucleation is prevented if the new disk location is already occu- 
pied by an existing disk. In addition, both nucleation and 
polymerization stop when the surface is completely covered 
with disks (Figure 3C). Solving the model in a mean-field approx- 
imation using the value obtained experimentally for Tq gives a 
final distribution of disk sizes with one unknown parameter 
{w/X) (Supplemental Information, Supplemental Mathematical 
Modeling part 1). Fitting (vv/A) = 9.8 + 1 .5x10“^ iim^, we find 
the distribution in excellent agreement with the experimental 
size distribution (Figure 3D). 

We then tested experimentally whether three key features of 
our theoretical model are indeed fulfilled during the generation 
of Snf7 assemblies. First, both the existence of initial single rings 
and the growth into spirals imply that Snf7 filaments have a 
preferred high curvature. Since the average radius of rings is in 
the range 25-30 nm, this might correspond to the preferred cur- 
vature. Second, the proposal of secondary nucleation of spirals 
implies that new spirals can form from existing ones. Third, poly- 
merization arrest should be correlated with contacts between 
neighboring disks. Finally, we also sought to determine indepen- 
dently the parameters w and A, whose ratio {w/X) had been esti- 
mated from the fit of the size distribution of spirals (Figure 3D). To 
address all this, we studied the molecular dynamics of spiral 
growth by total internal reflection fluorescence microscopy 
(TIRFM) and by dynamic high-speed AFM (HS-AFM) imaging 
(Casuso et al., 2012). 

Snf7 Filament Dynamics Uncovers the Intrinsic Filament 
Curvature and the Mode of Nucleation 

To characterize the initial events leading to patch formation, we 
first studied the early steps of Snf7 patch nucleation by TIRFM, 
which allowed us to quantify the approximate number of Snf7 
molecules in diffraction limited spots from their fluorescence in- 
tensity (see Experimental Procedures for quantification). Patch 
nucleation started with the appearance of a fluorescent diffrac- 
tion limited spot (nucleus. Figure 4A) containing 50 ± 20 mono- 
mers (n = 9). At a Snf7 concentration of 300 nM, the intensity 
of the nuclei remained constant for several minutes until these 
nuclei started to grow (Figure 4A). Under these conditions, the 
number of nuclei is very low (Figure 1 C). To increase the number 
of nuclei and to obtain more robust statistics, we nucleated Snf7 
assemblies by adding 1 |iM ESCRT-II and 1 |iM Vps20 to a 75 nM 
Snf7 solution. Under these conditions, many Snf7 nuclei 



(G) TEM images of Snf7 rings, singie (upper row) and doubie (iower row) stranded. 

(H) Top: TEM image of a singie Snf7 spirai. The Snf7 fiiament is undefined in green (resp. red) when doubie stranded (resp. singie stranded). Bottom: coior code of 
the fiiament path from the most inner turn (red) to the most outer turn (purpie). See aiso Figure S2D. 

(i) AFM images of connections between fiiaments: 1 to 3, spiit fiiaments connecting two spirais - 4 and 5, fiiament spiit within a spirai - 6, a spirai fiiament. 

(J) High resoiution AFM topographic image of Snf7 fiiament spiitting and branching within a singie Snf7 spirai. 

See aiso Figure S2. 
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appeared on the membrane surface (Figure 4B and Figure S4A) 
and remained stable for several tens of minutes, consistent with 
our observations with Snf7 alone (see Figure 4A). 

The Snf7 nuclei have an average number of molecules of 
60 ± 46 (mean ± SD; n = 1856, Figure 4C). If arranged in a 
closed ring, it would generate a circle of about 30 nm radius, 
considering a distance of 3.2 nm between Snf7 monomers 
in the filaments (Shen et al., 2014) (radius = perimeter / (2 tz ) = 
(60 X 3.2)/(2 X 3.14) ~32 nm). This radius corresponds to those 
observed in rings seen by EM (see Figure 2G), implying that the 
arrested nuclei observed by TIRF could be closed rings. This 
supports that Snf7 patch nucleation starts by the appearance 
of a single, highly curved Snf7 ring that would break to form a spi- 
ral. To further confirm this hypothesis, we further showed 
through photobleaching experiments that breakage of the nuclei 
induces patch formation (Figures S4B and S4C). 

The formation of highly curved nucleation rings suggests that 
Snf7 filaments have a preferred radius of curvature in the 25- 
30 nm range. Indeed, we measured radii of a 27 nm radius by 
EM and estimated a 32 nm radius by TIRF, in line with other 
studies (21 nm (Shen et al., 2014) and 32 nm (Flenne et al., 
201 2)). In spirals, this preference should not be satisfied, as outer 
turns are under-curved (123 nm radius. Figure 2C) and inner 
turns are over-curved (18 nm radius. Figure 2D). These forced 
suboptimal radii of curvature may induce significant mechanical 
stresses in the Snf7 filaments, which might in turn underlie Snf7 
biological function in membrane deformation. 

To explore the existence of such internal stresses, we used the 
HS-AFM tip as a nanodissector (Scheuring et al., 2003), briefly 
applying strong forces to partially break a densely packed array 



Figure 3. Modeling of Snf7 Patch Growth 

(A) A putative scenario for the nucleation and 
growth of Snf7 spirals into a patch: new spirals are 
formed from filaments protruding from pre-existing 
spirals. The new spirals separate from the mother 
spiral by filament break. 

(B) Schematic of the theoretical model for Snf7 patch 
growth. Snf7 spirals are represented by disks. Disks 
are created with an initial radius ro. Their area grows 
with a constant rate (w), which leads to a radius 
growing as the square root of time (upper graph and 
black curves). New spirals are nucleated over time 
proportionally to the spiral nucleation rate \ and to 
the total perimeter of existing disks. 

(C) Pictorial representations of a small membrane 
area being covered with Snf7 disks at the begin- 
ning (left) and at the end (right) of the growth 
process. 

(D) Cumulative distribution of spiral sizes (dots, 
calculated from Figure 2C) fitted with our theoret- 
ical model (line), imposing ro = 25 nm. The single fit 
parameter {w/X) is equal to 9.8x10“^ |am^. 



of Snf7 spirals. After breakage, 8 large 
disks were transformed into 29 small cir- 
cles (Figure 4D), with a radius of 17 ± 
5 nm (see size distribution Figure S4D). 
This is consistent with a scenario where 
Snf7 filaments were excised by rupture 
from the outer circles and curled back to a radius closer to their 
preferred radius. The nanodissector experiment indicated that 
even if polymerized at low radius of curvature, Snf7 filaments 
kept their ability to curl into smaller rings. 

The radius of these broken filaments (1 7 nm) is smaller than the 
one of the initial ring (~27 nm), but very close to the size of the 
inner turns in large spirals (18 nm; see Figure 2D). This could 
be consistent with a preferred radius of curvature of 17 nm. 
Alternatively, the nanodissector-induced rings experienced the 
lateral pressure of the neighbors, forcing them to a smaller 
radius. To discriminate these two hypotheses, we studied the 
nucleation of new spirals from pre-existing Snf7 assemblies at 
molecular resolution by FIS-AFM. We focused on areas where 
Snf7 spirals were already packed, but free membrane was still 
available (Figures 4E and F; Movies S5 and S6). Our image se- 
quences showed that newly formed spirals were mainly initiated 
from filaments protruding from pre-existing spirals (Figure 4E), 
revealing the mechanism by which spiral nucleation occurs. 
The outer radius of these spirals grew with time while forming 
new turns (Figures 4F and 4G). While growing, bundled filaments 
can transiently separate and interact laterally with the neigh- 
boring bundles (Figure 4FI). 

Strikingly, when increasing from 2 to 3 concentric circles, the 
radius of the innermost circle was reduced from 22 nm to 
14 nm (Figure 4G). These observations strongly support our hy- 
pothesis that the preferred radius of curvature of Snf7 oligomers 
is about 25 nm, but that lateral pressure can induce higher 
curvatures. 

The radial growth of free spirals was initially rapid, but then 
slowed down (Figure 41), roughly following a dependence on 
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the square root of time as expected from our model. From these 
data, we measured an area growth rate 1 / 1 / of 80 ± 36 nm^.s”'' 
(n = 5; see Experimental Procedures for quantification) at 
1 |iM. We estimated from w a growth rate of approximately 3 
subunits. s“V|iM“'' (Supplemental Information, Supplemental 
Mathematical Modeling). The estimated filament growth rate is 
in the range of other filament rates: actin and tubulin are in the 
range of 5-10 subunits. s“''.|iM“''. Using our previous estimate 
(vv/A) = 9.8 X 1 0“^ |im^ and this experimental value for w, we ob- 
tained the spiral nucleation rate as X = 8.2 x 1 0“^ spiral. |im“^ .s“^ 
at [Snf7] = 1 |iM. This secondary spiral nucleation rate is 500 
times larger than the initial patch nucleation rate (Supplemental 
Information, Supplemental Mathematical Modeling). It validated 
our assumption that the patch nucleation was negligible in our 
theoretical model. 

In summary, these observations indicated (1) a preferred high 
curvature of the Snf7 filaments and (2) the mechanism of spiral 
nucleation from existing spirals. We then set up to study the third 
feature of our model, whether lateral contacts between spirals 
can inhibit their growth. 

Polymerization of Snf7 Filaments Induces Compression 
of the Spirals 

AFM images of packed arrays of spirals showed that filaments at 
the contact zone between spirals were flattened, resulting in spi- 
rals acquiring a polygonal shape that was more pronounced for 
longer incubation times (Figure 5A). Moreover, the central area of 
some of these polygons was pushed toward the substrate, as 
seen in the height profile of AFM images (Figure 5B). Also, the 
centers of these spirals were often found to be stiffer as seen 
in AFM mechanical maps (Figure 5B). We reasoned that this 
deformation reflected lateral compression of the spirals as the 
membrane became covered with Snf7. To study the correlation 
between polymerization rate and lateral compression of the 
Snf7 assemblies, we ought to measure them simultaneously. 

We started by reconstituting Snf7 polymerization on GUVs 
adhered to a glass surface, to avoid any displacement during 
time-lapse imaging (Figure S5A) and followed the increase of 
Snf7 fluorescence on the vesicle with time. In supported bilayers, 
which have smaller lipid mobility than free bilayers, diffusion of 
Snf7 assemblies is very limited. In contrast, Snf7 assemblies 
had a higher diffusion on GUVs explaining that fluorescence 
showed a homogeneous distribution (Figure 5C). 

The dynamics of saturation of Snf7 polymerization over the 
entire GUV (Figure 5C, bottom) was similar to the dynamics of 
coverage at a single point in the supported bilayer experiments 
(Figure S5B): after an approximately exponential increase of 
the fluorescence, the dynamics of coverage saturated through 
a progressive slow down phase. If polymerization rate is inde- 
pendent of lateral compression, as it is in our theoretical model, 
an abrupt arrest of growth is expected (Figure S5C). Conversely, 
a polymerization rate dependent on lateral compression would 
cause a progressive slow down until reaching saturation. 

As an indication of lateral compression, we noticed that 
coated GUVs underwent dramatic morphological changes 
upon several hours of Snf7 polymerization: GUVs were not 
spherical anymore and instead showed extreme irregular 
shapes, similar to rigid punched table tennis balls (Figure 5D). 



Upon aspiration into a micropipette, they deformed plastically 
(Figure 5E and Figure S5D), showing that the Snf7 solidifies the 
membrane. These observations are consistent with a scenario 
where a rigid Snf7 coat generates pressure on the GUVs, 
stretching their membrane. 

We reasoned that the accumulation of lateral compression in 
the Snf7 coat would stretch the underlying membrane, 
increasing its tension (Figure 5F). In order to follow the dynamics 
of accumulation of lateral compression within the Snf7 layer, we 
directly measured membrane tension generation during Snf7 
polymerization. Using optical tweezers, we pulled a thin tether 
from a GUV held in an aspiration pipette, that allows to monitor 
membrane tension through the measurement of the force F 
exerted on the optical tweezers’ bead (Cuvelier et al., 2005) (Fig- 
ure 5G). We then flowed a 500 nM solution of Snf7 using an injec- 
tion pipette, which triggered protein assembly onto the mem- 
brane (Figure 5H). Monitoring Snf7 fluorescence, we found that 
its membrane binding dynamics was identical to that measured 
previously, reaching the saturation value after 10-20 min (Fig- 
ure 51, top). Concomitant with this saturation, we observed an in- 
crease in the force exerted by the tube, indicating an increase of 
membrane tension (Figure 51, bottom). These data show that 
Snf7 polymerization occurs at a slower rate as compression in- 
creases within the Snf7 layer, suggesting a coupling between 
polymerization and compression. The dependence of the poly- 
merization energy /i (polymerization energy per unit surface) on 
the changes in tube force is captured by the expression (Supple- 
mental Information, Supplemental Mathematical Modeling 
part 4.3): 



where F, = 8 pN and = 35 pN are the tube forces before and 
after Snf7 polymerization and = 4.8x10“^° J, is the mem- 
brane bending rigidity (Figure S5E). These values yield 
yti = 3.1 xIO”"^ J.m“^, which is a fairly high polymerization en- 
ergy, twice that of clathrin (Saleem et al., 2015), which by itself 
is able to cause membrane deformation during endocytosis. 
This indicates that the polymerization force of Snf7 can plau- 
sibly cause membrane deformation. 

These data are therefore compatible with a scenario where 
Snf7 spirals act as 2D springs that load themselves through poly- 
merization. Constrained by their neighbors, Snf7 spirals would 
deform significantly during polymerization, from a disk-like to a 
polygonal shape, until deformation and polymerization force 
are balanced. To test quantitatively whether the polymerization 
force is sufficient to deform Snf7 spirals, we established a theo- 
retical elastic model of spiral compression. 

In the model, the Snf7 layer is approximated to a hexagonal 
lattice of individual spirals. Each spiral is composed of a collec- 
tion of concentric filaments spaced by a distance b = 1 7 nm up to 
a typical radius F? = 130 nm similar to the experimental data 
shown in Figure 2C. The filaments are circular in the absence 
of lateral compression, but may deform into hexagons with 
rounded vertices, to accommodate tighter packing, as illustrated 
in Figures 5A and 5F. The amount of energy required to deform 
circular filaments into more polygonal shapes depend on the 
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Figure 4. Nucleation and Growth of Snf7 Spirals on Supported Membranes 

(A) Left: TIRF microscopy kymographs of the nucleation of single Snf7 patches (green) at [Snf7] = 300 nM. Arrows indicate single ring to multiple spirals transition 
as postulated from the interpretation of these observations (right). 

(B) TIRF microscopy image of Snf7-Alexa488 dots (green) nucleated by ESCRT-II, [Snf7] = 75 nM, [Vps20] = 1 |iM, [ESCRT-II] = 1 |iM. Inset: zoom on 4 diffraction- 
limited spots (scale bar, 2 i^m). 

(C) Histogram of the estimated number of Snf7 molecules within the dots nucleated by ESCRT-II (n = 1856). 

(D) HS-AFM nanodissection experiment (see text) of Snf7 spirals. 2 cycles of high AFM force were applied, between 0 s and 10 s, and between 10 s and 20 s. 

(E) HS-AFM time-lapse sequence showing the apparition of a new Snf7 spiral from pre-existing ones. Arrowheads show: filament protruding from a spiral (t = 
8.5 s), filament curling from its tip (t = 1 7.0 s), and forming a small spiral (t = 37.4 s), growth of a second turn in the spiral (t = 1 52.2 s) and filament rearrangements 
(t = 164.9 s). 

(legend continued on next page) 
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stiffness of the Snf7 filaments, which is characterized by its 
persistence length £p. We estimated this persistence length 
from the amplitude of the thermal fluctuations of isolated Snf7 
filaments on a supported bilayer observed by HS-AFM. We 
obtained £p = 260 nm (Supplemental Information, Supplemental 
Mathematical Modeling part 2). This is in order-of-magnitude 
agreement with an estimate from a numerical model of Snf7 
flexion (£p~800nm) (Shen et al., 2014). This value is higher 
than for DNA (£p = 50 nm), but smaller than for cytoskeletal fila- 
ments (£p = 1 5 |im for actin and £p = 6 mm for microtubules) (Ho- 
ward, 2001), implying that they are intrinsically soft enough to be 
deformed by moderate forces. 

In our theoretical 2D spring model, spirals become signifi- 
cantly deformed when /i exceeds a threshold surface energy 
fi* = 1 /(2 - 7r/VS){kBT9.p/R^b)\og{R/b) (where ks is the Boltz- 
mann constant and T the temperature, see Supplemental Infor- 
mation, Supplemental Mathematical Modeling part 3). Our 
experimental estimates for R (~130 nm), b (17 nm) and £p 
(260 nm) implied /x* =4.0x10“^ J«m“^, 8 times smaller than 
the measured experimental ii. It indicates that the Snf7 poly- 
merization energy, even if underestimated, is sufficient to 
induce strong deformations. Our data show that the Snf7 spi- 
rals can deform as spiral springs, and can self-load through a 
mechanism where deformation is mostly generated by growth 
of a filament. 

Snf7 Spirals Expansion Leads to Membrane 
Deformation 

Having established that the Snf7 spirals can deform to store sig- 
nificant elastic energy, we wondered if they could release this 
energy to deform the membrane. After several hours of Snf7 in- 
cubation, holes (called “pores” in the following) spontaneously 
appeared in a few GUVs, releasing membrane tension. Surpris- 
ingly, instead of bursting, the GUV membrane shrunk from the 
rim of the pore toward the opposite side of the GUV (Movie 
S7). Occasionally, this process stopped before the vesicle had 
fully collapsed, and stable vesicles with open pores were 
observed (Figure 6A; Movie S8). In this case, a stronger fluores- 
cence signal is seen at the rim. 

To understand the stronger signal of the membrane marker at 
the rim, we imaged these opened GUVs by thin-section EM: the 
membrane at the rim of the pore was rolled toward the interior of 
the vesicle (Figure 6B). This process is known as curling, and has 
previously been observed in a number of situations, including 
polymersomes (Mabrouk et al., 2009) and during the bursting 
of red blood cells (Callan-Jones et al., 2012). It occurs when an 
area difference appears between the two sides of a bi-layered 
surface. We hypothesized that curling could be driven by the 
expansion of the previously compressed Snf7 layer following 
pore formation. To quantitatively study the plausibility of such 
scenario, we used our theoretical model to compute the ratio 



of the surface occupied by a compressed spiral (Acompressed) to 
that of the a relaxed state (Areiaxed) (see Supplemental Informa- 
tion, Supplemental Mathematical Modeling part 3) as 



'^compressed 






( 2 ) 



The value of this ratio implies that, during stress release, 
the Snf7 layer would expand by 6% relative to the underlying, 
almost inextensible lipid bilayer (Figure 6C). As a result, the 
membrane of the GUV would curl inward (Figure 6D), con- 
sistently with our observations in fluorescence and electron 
microscopy. The preferred curvature of the curl can be estimated 

as Tc = (d / 2) (Areiaxed '^Acompressed) / (Areiaxed ~ Acompressed) ^ Where 

2d is the total thickness of the Iipid-Snf7 sandwich. Using AFM 
to measure the thickness of the Snf7 coated membranes, we 
estimated 2c/ = 9 nm (5 nm for the membrane plus 4 nm for the 
Snf7 coat). With this value, we can estimate rc = 37 nm. Consid- 
ering previous studies (Callan-Jones et al., 201 2), Tc corresponds 
to the curvature of the innermost roll observed (Figure 6D). 
Experimentally, we find a mean radius of Cc = 39 + 6 nm (n = 9). 
Therefore curling in the opened vesicles can be explained by 
the expansion of the Snf7 spiral springs. During this expansion, 
the spirals release their compression energy accumulated during 
polymerization. 

We wondered how much energy was stored in a single spiral, 
as compared to the energy required for budding of a vesicle. Our 
elastic model implies that the lateral compression of a single 
Snf7 spiral corresponds to the accumulation of an elastic energy 
AE= 170 /ceT = 7.0x10“''® J, which is bigger than the bending 
energy 47 t/c = 160 /ceT required to form a spherical membrane 
bud (see Supplemental Information, Supplemental Mathematical 
Modeling). Thus, a single spiral can accumulate enough elastic 
energy to form a spherical bud when released. 

In summary, we show here that Snf7 filaments display the abil- 
ity to act as spiral springs that load through polymerization. The 
release of the compression stress accumulated during the defor- 
mation of the Snf7 spiral is sufficient to drive membrane 
deformation. 



DISCUSSION 

In this study, we first showed that lipid membranes trigger the for- 
mation of wild-type Snf7 assemblies at their surface through a 
process of nucleation-growth. The patch nucleation rate is low 
(less than 1 seed.|im“^.hour“^) which explains the necessity of a 
polymerization-activated mutant to observe the same assemblies 
by EM in previous studies (Henne et al., 2012). We found that the 
circular arrays formed by Snf7 on these membranes are spirals 
made of a single filament looping and interacting onto itself. This 
confirms that the interaction with membrane in vitro retains the 



(F) HS-AFM time-lapse sequence of an isolated Snf7 spiral. Arrowheads show: growth of the spiral at the two-turn stage (t = 67.2 s), and filament split (t = 75.7 s) 
leading to the three turns stage. 

(G) The equatorial kymograph (yellow rectangle) of this growing spiral: the innermost turn radius decreases from 22 nm to 1 4 nm upon formation of the third turn. 

(H) Dynamics of filament splitting and fusing in two Snf7 spirals (rows) observed by HS-AFM. Arrowheads show displacement of the splitting points. 

(I) Time plot of the outer radius of five growing Snf7 spirals followed by HS-AFM. The origin of all curves is the apparition of the first turn. The thick curve is the 
average of all curves. [Snf7] = 1 ^iM. See also Figure S4. 
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Figure 5. Build-up of Lateral Compression 
in Snf7 Spirals by Polymerization 

(A) HS-AFM images of Snf7 spirals acquiring 
polygonal shapes with time. 

(B) AFM Topography and nanomechanical map- 
ping of polygonal Snf7 spirals. A significant pro- 
portion of spirals (dashed outlines) have a lower 
center with increased mechanical stiffness. 

(C) Snf7 polymerization on GUVs made of DOPC 
60% / DOPS 40% + Rhodamine-PE 0.1% (red), 
0.003% DOPE-Peg2000-Biotin. GUVs are incu- 
bated with 500 nM Snf7-Alexa488 (green). Top: 
SDC images of a GUV equatorial plane during 
Snf7 polymerization. Bottom: fluorescence in- 
tensity (equatorial plane) of 4 GUVs with time. 

(D) GUVs before (top) and after (bottom) several 
hours of incubation with Snf7-Alexa488. 

(E) Snf7 coated GUVs keep the aspirated shape 
after release from the micropipette. 

(F) Sketch of membrane stretching by Snf7 spiral 
compression. 

(G) Schematic of the membrane tension mea- 
surement setup combining holding pipette, injec- 
tion pipette, bead within an optical trap, giant 
vesicle (red) and Snf7 (green). 

(H) Top image: SDC image of a membrane tension 
measurement experiment (red = membrane, 
green = Snf7-Alexa488). Note that Snf7-Alexa488 
did not polymerize on the membrane nanotube. 
Bottom: brightfield image of the same vesicle. The 
yellow cross indicates the resting position of the 
bead held by the optical trap. 

(I) Top: Normalized Snf7 fluorescence intensity 
versus time (measured from equatorial plane); 
bottom: force exerted by the membrane nanotube 
on the bead versus time. See also Figure S5. 
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spiral structure recently observed in solution (Shen et al., 2014) or 
in vivo (Cashikar et al., 201 4; Hanson et al., 2008). We also find that 
Snf7 filaments can bundle into double-stranded filaments, prob- 
ably through parallel lateral interactions. 

In our assay, spirals become tightly packed into polygonal lat- 
tice at the surface of the supported bilayers. The packing of 
these spirals is correlated with the increase of lateral compres- 
sion within the Snf7 coat. These data implies that ESCRT-III spi- 



rals, because of the relatively high flexi- 
bility of the Snf7 filament can be 
deformed by lateral compression. More- 
over, we show that the expansion of com- 
pressed spirals can lead to membrane 
deformation if confinement is released 
(Figure 6). These observations imply that 
Snf7 can work as a two-dimensional 
spring, being able to compress and 
expand. In the following, we discuss 
how this spring-like activity highlighted 
by our study is relevant for the in vivo 
situation. 

In vivo, it is unlikely that a densely 
packed array of ESCRT-III spirals is pre- 
sent at the surface of membranes, which 
may question the physiological relevance of the spiral compres- 
sion observed in large patches of ESCRT-III. But the confine- 
ment required for such lateral compression might come from 
other membrane proteins, which may provide walls into which 
single spirals could be confined. In the membrane of MVBs, 
Lampi and 2 are particularly enriched (Bissig and Gruenberg, 
2014) and may provide a scaffold onto which ESCRT-III spirals 
could be compressed. Of course, compression being isotropic 
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Figure 6. Snf7 Lateral Pressure and Expan- 
sion Induced Membrane Deformations 

(A) Confocal sections of Snf7 coated vesicles dis- 
playing stable holes. Fluorescence is more intense 
at the rim of the pore. 

(B) EM thin section image of a Snf7 coated vesicle 
with a stable pore. Note the curling of the mem- 
brane rim. Several other examples of membrane 
curling are shown in lower panels. 

(C) Sketch of the expected curvature generated by 
expansion of compressed Snf7 spirals. 

(D) Sketch of the pore opening and curling of Snf7 
coated vesicle. Bottom images show the expected 
section of a stable pore in the GUV and a zoom on 
the membrane curled region. 



4 nm 

5 nm 




in this in vivo case, compressed spirals would stay circular, 
instead of polygonal. 

But another source of lateral compression is intrinsic to the 
spiral structure, and is present in single Snf7 spiral even in the 
absence of any external confining structures. We show that 
the filaments curl spontaneously at 20-30 nm, implying that if 
they grow at a different radius, they are under mechanical stress. 
Indeed, when spirals are broken, all the pieces of filaments 
further curl to a smaller radius. Thus, in the spiral structure, fila- 
ments with a radius larger than 25 nm are stretched, compress- 



ing the inner filaments. Accordingly, we 
find that even for non-laterally con- 
strained spirals, the inner turn of the spi- 
rals tightens when the number of turns in 
the spirals goes above three. 

Whatever the source of compression is, 
our observations show the ability of Snf7 
spirals to deform elastically and accumu- 
late potential energy that can be used for 
membrane deformation. But how would 
such energy drive membrane budding? 
It was previously proposed that the poly- 
merization of ESCRT-III could enclose a 
patch of membrane and then, by reduc- 
tion of the length of the Snf7 rim, the mem- 
brane would be folded into a bud in the 
middle of the Snf7 polymer. The ESCRT- 
III rim reduction has been proposed to 
be mediated by depolymerization of the 
Snf7 spiral (lasso model, (Saksena et al., 
2009)) or by further inward polymerization 
of the Snf7 spiral (Cashikar et al., 2014). 
Our data suggests that Snf7 spiral spring 
could mediate the rim reduction by its 
elastic compression down to a 14 nm 
radius. However in this scenario, because 
the membrane is fluid, it is difficult to pic- 
ture how the force of the spiral spring 
would be transmitted to the membrane. 
We propose that cargoes play an essen- 
tial role for the force transmission (Fig- 
ure 7, left): the rim reduction would lead 
to compaction of enclosed membrane cargoes. Theoretical 
(Derganc et al., 2013) and experimental studies (Stachowiak 
et al., 2012) indicate that highly dense cargoes could mediate 
budding by asymmetric crowding. This is consistent with a 
recent in vivo study (Mageswaran et al., 2015) where ILVs 
budding was critically dependent on accumulation of cargoes 
within ESCRT-III assemblies. 

Another possibility is that the out-of-plane buckling of the 
spring itself might drive the invagination of the membrane (Fig- 
ure 7, right) (Lenz et al., 2009), which could explain the formation 
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of membrane tubules by overexpression of the human homolog 
of Snf7 (Hanson et al., 2008). This model implies that the flat 
membrane could be a metastable state of this elastic system: 
the stored elastic energy could be suddenly released upon 
external activation. Our observation that the spirals adopt a 
curved inverted dome shape (Figure 5B) is consistent with this 
model, and an estimate of the energy stored in one spiral further 
confirm that this energy is sufficient to bud the membrane into a 
sphere. 

The spring-like properties of Snf7 filaments also nourish our 
understanding of ESCRT-III role in membrane fission. Because 
of their high flexibility, Snf7 filaments can grow at radii different 
from their preferred radius of curvature if steric or mechanical 
constraints force them to do so. This feature explains how 
Snf7 filaments could adapt to the wide range of radii observed 
in the various ESCRT-III mediated fission reactions: from mi- 
crons in abscission and hundreds of nanometer in virus 
budding, down to tens of nanometers in ILV formation and 
membrane repair (Jimenez et al., 2014). However, the smallest 
size of the inner turn is on average 18 nm radius, which is far 
from the 1.4 nm observed with dynamin to finalize fission 
(Sundborger et al., 2014). This raises questions regarding the 
mechanism necessary to provide fission and pore closure 
and supports the role of other ESCRT-III proteins and lipids in 
these reactions. 

EXPERIMENTAL PROCEDURES 
Protein Purification and Labeiing 

Snf7 (Addgene plasmid no. 21492), Escrt-ll (Addgene plasmid no. 17633) and 
Vps20 (Addgene plasmid no. 21 490) were purified as previously described (Hi- 
erro et al., 2004; Wollert et al., 2009). Snf7 stock solution was 2.5 laM in 20 mM 
HEPES, 1 00 mM NaCI (pH 8). Snf7 was labeled either with TFP-Alexa-488 (Life 
technologie product no. A37563) or with NHS-Atto 647N (Atto-tec product no. 
AD 647N-3). Escrt-ll (20 |iM stock solution in 50 mM Tris, 150 mM NaCI, 5 mM 



and 



Figure 7. Models of ESCRT-iii Mediated 
Budding and Fission of intra-lumenal Vesi- 
cles 

Left: cargo sequestration and ESCRT-III lateral 
compression induces membrane budding. 
Further ESCRT-III narrowing might lead to fission. 
Right: ESCRT-III lateral compression leads to 
buckling. 



p-mercaptoethanol [pH 7.5]) and Vps20 (10 i^M 
stock solution in 20 mM HEPES 100 mM NaCI 
[pH 7.5]) were kept unlabeled. 

Giant Uniiameiiar Vesicies and Large 
Uniiameliar Vesicies Preparation 

GUVs were prepared by electroformation using 
DOPC and DOPS mixtures, purchased from Avanti 
Polar Lipids (Alabaster, USA). When necessary, 
0.1% fluorescent lipids (1 ,2-dioleoyl-sn-glycero- 
3-phosphoethanolamine-N-(lissamine rhodamine 
B sulfonyl)) (Rhodamine-PE) were added. 

LUVs were prepared by evaporating in a 
round-bottom glass tube, a volume of lipid mix 
(DOPC: DOPS, 6:4, mohmol) containing 1 mg of 
total lipids. After addition of 200 |al of buffer, the 
freeze-thaw 3 times. This solution is kept at 



tube was vortexed 
-20°C until use. 

Unless otherwise noted, the buffer used for all experiments is composed of 
20 mM Tris HCI (pH 6.8), 200 mM NaCI, 1 mM MgCl 2 . 

Optical Microscopy of Membrane Assays 

For confocal and TIRF imaging, a coverslip is cleaned with water and ethanol, 
and then plasma-cleaned for 2 min (PDC-32G, Harrick Plasma, NY, USA). The 
coverslip is assembled to a flow chamber (sticky-Slide VI 0.4, Ibidi, Munich, 
Germany), with one entry connected to a syringe pump (Aladdin, World Preci- 
sion Instruments, Sarasota, FL, USA), and the other left open for sequential 
introduction of other solutions. The flow chamber is initially filled by 200 |al of 
buffer. 5 |il of GUVs are flushed in the flow chamber (see Extended Experi- 
mental Procedures for methods to get supported membranes or partially 
adhered vesicles). 

Imaging is performed using an inverted microscope assembled by 31 (Intel- 
ligent Imaging Innovation, Denver, USA) and Nikon (Eclipse Cl, Nikon, Tokyo, 
Japan). For SDC imaging, a 2-^im-thick volume stack (1 ^im above and below 
the supported membrane) is acquired then rendered to 2D by maximum inten- 
sity projection. TIRF Imaging is performed using a motorized Nikon TIRF sys- 
tem. The number of molecules within Snf7 oligomers is estimated by calibrat- 
ing the microscope with commercially available fluorescent DNA origamis 
(GATTA-Brightness 9R and 18R, GATTAquant, Braunschweig, Germany) 
(Figure S6). 

Optical Tweezers Tube Pulling Experiment 

A modified version of a published setup (Morlot et al., 2012) allows simulta- 
neous brightfield imaging, SDC microscopy, and optical tweezing on an in- 
verted Nikon eclipse Ti microscope. A GUV is aspirated within a micropipette 
connected to a motorized micromanipulator (MP-285, Sutter Instrument, 
Novato, CA, USA) and a pressure control system (MFCS-VAC -69 mbar, 
Fluigent, Villejuif, France) that sets the aspiration pressure AP. A membrane 
nanotube is then pulled out from the GUV through a streptavidin-coated 
bead (3.05 i^m diameter, Spherotec, Lake Forest, IL, USA) held in a fixed op- 
tical trap. The optical trap was custom-made with a continuous 5 W 1064 nm 
fiber laser (ML5-CW-P-TKS-OTS, Manlight, Lannion, France) focused 
through a 100X 1.3 NA oil immersion objective. The force F exerted on the 
bead was calculated from the Hooke’s law: F = k.Ax, where k is the stiffness 
of the trap (k = 60 pN.^im"’') and Ax the displacement of the bead from its 
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equilibrium position. Snf7 was injected close to the nanotube with a second 
micropipette connected to another channel of the Fluigent pressure control 
system. 

Electron Microscopy 

For negative stain EM observations, LUVs were incubated with Snf7 in suspen- 
sion, spun down (4 min at 4,000 g), washed and then adsorbed onto glow-dis- 
charged Formvar coated EM grids. The samples were negatively stained for 
30 s with 2% uranyl acetate before visualization. 

Ultrathin sectioning of Snf7 bound LUVs fixed in epon was performed using 
a microtome (Leica Ultracut) at a cutting angle of 6°. Sections were put on 
glow-discharged carbon-coated formvar grids and imaged with a Tecnai G2 
Sphera (FEI) electron microscope. 

AFM and HS-AFM 

For both PF-QNM AFM and HS-AFM, GUVs, prepared as described above, 
were adsorbed to the mica support followed by protein addition. For PF- 
QNM AFM experiments 5 |il of the GUVs and for HS-AFM experiments 0.5 |il 
of GUVs were deposited onto freshly cleaved mica supports pre-incubated 
with adsorption buffer (220 mM NaCI, 10 mM HEPES, 2 mM MgCl 2 , [pH 
7.4]). Supported lipid bilayers were first imaged to assess the quality of the lipid 
bilayer preparation before injecting Snf7 into the fluid cell to a concentration of 
^500 nM. Formation of Snf7 assemblies were observable -^30 min after Snf7 
injection. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures, 
six figures, and eight movies and can be found with this article online at 
http://dx.d 0 i. 0 rg/l 0. 1 01 6/j.cell.201 5.10.01 7. 
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SUMMARY 

Increased mobility of chromatin surrounding double- 
strand breaks (DSBs) has been noted in yeast and 
mammalian cells but the underlying mechanism 
and its contribution to DSB repair remain unclear. 
Here, we use a telomere-based system to track 
DNA damage foci with high resolution in living cells. 
We find that the greater mobility of damaged chro- 
matin requires 53BP1 , SUN1 /2 in the linker of the nu- 
cleoskeleton, and cytoskeleton (LINC) complex and 
dynamic microtubules. The data further demonstrate 
that the excursions promote non-homologous end 
joining of dysfunctional telomeres and implicated 
Nesprin-4 and kinesins in telomere fusion. 53BP1/ 
LINC/microtubule-dependent mobility is also evident 
at irradiation-induced DSBs and contributes to the 
mis-rejoining of drug-induced DSBs in BRCA1 -defi- 
cient cells showing that DSB mobility can be detri- 
mental in cells with numerous DSBs. In contrast, 
under physiological conditions where cells have 
only one or a few lesions, DSB mobility is proposed 
to prevent errors in DNA repair. 

INTRODUCTION 

The integrity of eukaryotic genomes is perpetually threatened 
by the formation of double-stranded breaks (DSBs), which can 
arise due to errors in DNA metabolism or genotoxic insults, 
such as chemotherapeutic agents. The repair of DSBs is a 
critical aspect of genome maintenance, despite the fact that 
non-cycling cells experience only a few DSBs per day 
(Fumagalli et al., 2012; U. Herbig, personal communication). 
In G1, DSBs are repaired by non-homologous end-joining 
(NHEJ) whereas replicating cells can also use a second 
pathway, homology-directed repair (HDR), to restore genome 
integrity. NHEJ and HDR are highly regulated to avoid 
ectopic repair, which can generate translocations, mult- 
icentric chromosomes, and other deleterious chromosome 
rearrangements. 

The role of the DNA damage response factor 53BP1 in DSB 
repair and its contribution to cell-cycle appropriate execution 
of NHEJ and HDR has been studied extensively (reviewed in 
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Escribano-Diaz et al., 2013; Panier and Boulton, 2014; Zimmer- 
mann and de Lange, 2014). 53BP1 accumulates at sites of DNA 
damage through a dual interaction between its Tudor domain 
with constitutively dimethylated histone H4 (H4K20diMe) and 
its UDR domain with ubiquitylated histone H2A (H2AK15Ub), 
which marks sites of DNA damage. Many of the functions of 
53BP1 are mediated by binding partners that associate with 
the 53BP1 N terminus upon phosphorylation of ST/Q sites by 
the ataxia telangiectasia mutated (ATM) and ataxia telangiecta- 
sia and Rad3 related (ATR) kinases. 

A critical role of 53BP1 is to limit the 5' resection of the 
broken ends in a cell-cycle-dependent manner. Whereas inap- 
propriate resection in G1 will impede the repair of DSBs by 
NHEJ, resection is needed for HDR in S/G2. Inhibition of 5' 
end resection in G1 is primarily mediated by the 53BP1 -bound 
Rifi and Rev7/MAD2L2, but the mechanism by which resection 
is blocked is unknown (Chapman et al., 2013; Di Virgilio et al., 
2013; Escribano-Diaz et al., 2013; Feng et al., 2013; Zimmer- 
mann et al., 2013; Boersma et al., 2015; Xu et al., 2015). In 
S/G2, the action of Rifi and Rev7/MAD2L2 are counteracted 
by BRCA1 , allowing resection and generating the 3' overhangs 
required for HDR. A second 53BP1 -interacting factor, PTIP, has 
an auxiliary role that involves end trimming by the Artemis 
nuclease (Munoz et al., 2007; Callen et al., 2013; Wang et al., 
2014). 

The contribution of 53BP1 to DSB repair pathway choice has 
received considerable attention in the context of the treatment 
of BRCA1 -deficient cancers with poly(ADP-ribose) polymerase 
inhibitors (PARPi) (reviewed in Banerjee et al., 2010). PARP inhi- 
bition results in a large number of persistent single-stranded (ss) 
gaps that are converted into DSBs by DNA replication. In 
absence of BRCA1 , the inefficiency of 5' end resection allows 
NHEJ to dominate the repair. When many broken ends persist, 
NHEJ can promote mis-rejoining of broken chromatids, forming 
radial chromosomes and chromosome aberrations that have le- 
thal consequences. This mis-repair of DSBs determines the syn- 
thetic lethality of PARP inhibition and HR deficiency. Removal of 
53BP1 in this setting blocks the formation of mis-repaired chro- 
mosomes, in part by alleviating the inhibition of resection and 
hence restoring HDR (Cao et al., 2009; Bouwman et al., 2010; 
Bunting et al., 2010; Chapman et al., 2013; Di Virgilio et al., 
201 3; Zimmermann et al., 201 3; Xu et al., 201 5). Indeed, absence 
of Rifi or MAD2L2 also minimizes the formation of mis-repaired 
chromosomes in PARPi-treated BRCA1 -negative cells. How- 
ever, 53BP1 has a greater effect than Rifi (Zimmermann et al., 

CrossMark 





Cell 



2013) , suggesting a second mechanism by which 53BP1 pro- 
motes mis-rejoining. 

We have used dysfunctional telomeres to investigate the sec- 
ond, Rifi -independent function of 53BP1 . Mammalian telomeres 
are protected from the DNA damage response (DDR) by the six- 
member shelterin protein complex residing on the telomeric 
TTAGGG repeats (reviewed in Palm and de Lange, 2008). 
Removal of TRF2 from shelterin unleashes two pathways that 
normally are repressed at telomeres. Telomeres lacking TRF2 
activate ATM kinase signaling, leading to Chk2 phosphorylation 
and the accumulation of 53BP1 at telomeres. In addition, 
TRF2 loss from telomeres renders them highly susceptible to 
Ku70/80- and DNA ligase IV (Iig4)-dependent classical(c)-NHEJ. 

In addition to blocking resection at dysfunctional telomeres, 
53BP1 alters their mobility. After loss of TRF2, telomeres travel 
greater distances and roam larger subnuclear territories than 
functional telomeres (Dimitrova et al., 2008). This effect was 
also observed upon telomere deprotection with a TIN2 short 
hairpin RNA (shRNA) (Chen et al., 2013). The altered mobility of 
dysfunctional telomeres is strictly dependent on 53BP1 but not 
influenced by Rifi (Zimmermann et al., 2013) or Rev7/MAD2L2 
(Boersma et al., 2015). Given that, in G1 , the fusion of two telo- 
meres involves chromosome ends that are spatially separated, 
we speculated that 53BP1 -dependent mobility could stimulate 
c-NHEJ by increasing the chance that two ends become juxta- 
posed. Indeed, 53BP1 is required for telomere-telomere fusions 
(Dimitrova et al., 2008) and this dependency cannot be fully 
explained by the ability of 53BP1 to block resection (Zimmer- 
mann et al., 2013). 

In budding yeast, increased chromatin mobility occurs near an 
l-Sce-induced DSB and, to lesser extent, at the level of global 
chromatin (Dion et al., 2012; Mine-Hattab and Rothstein, 2012; 
Seeber et al., 2013), possibly enhancing the homology search 
needed for HDR (Agmon et al., 2013). Similarly, in fission yeast, 
DSBs associate with the LING complex in a process that pro- 
motes HDR (Swartz et al., 2014). However, the data on the 
mobility of DSBs in mammalian cells has been equivocal 
(reviewed in Dion and Gasser, 2013). Ionizing radiation (IR)- 
induced DSBs show an ATM-dependent increase in mobility 
(Neumaier et al., 2012; Becker et al., 2014), lesions induced by 
a-particles or l-Ppol have been inferred to move (Aten et al., 
2004; Falk et al., 2007; Gandhi et al., 2012), and directed move- 
ment occurs during telomere recombination in the context of the 
alternative lengthening of telomeres (ALT) pathway (Cho et al., 

2014) . However, other findings have argued against an altered 
mobility of DSBs (Kruhlak et al., 2006; Soutoglou et al., 2007; Ja- 
kob et al., 2009). 

Using time-lapse imaging of conditional TRF2 knockout (KO) 
mouse embryonic fibroblasts (MEFs) as a model system, we 
demonstrate here that 53BP1 -dependent chromatin mobility is 
mediated by microtubules and the LINC complex. The LINC 
complex spans the inner and outer membranes (INM and 
ONM, respectively) of the nuclear envelope (NE) and connects 
components of the cytoskeleton, including microtubules, with 
the inside of the nucleus such that cytoskeletal forces are trans- 
ferred to the nuclear content (reviewed in Starr and Fridolfsson, 
2010; Wilson and Foisner, 2010; Chang et al., 2015). The key 
components of the mammalian LINC complex are the trans- 



membrane SUN-domain proteins, SUN1 and SUN2, which 
span the INM and interact with the KASH-domain nesprin pro- 
teins in the lumen of the NE. Nesprins cross the ONM and con- 
nect to cytoplasmic filaments, including microtubules. Using 
microtubule poisons in combination with SUN1/2 and kinesin 
KO MEFs, we show that the 53BP1 -dependent mobility of 
dysfunctional telomeres is a LINC/microtubule-dependent pro- 
cess that promotes NHEJ. Furthermore, we document that 
the same 53BP1 /LINC/microtubule-dependent mechanism pro- 
motes the mobility of IR-induced DSBs and contributes to their 
mis-repair in PARPi-treated BRCA1 -deficient cells. These re- 
sults establish a feature of the DDR that can lead to aberrant 
DNA repair when cells sustain large numbers of breaks. We 
argue that this potentially dangerous system is adaptive in the 
context of the physiological DDR, which has evolved to ensure 
correct DNA repair in cells with few DSBs. 

RESULTS 

A Standardized Method for Analysis of Dynamic 
Behavior of DNA Damage Foci 

The mechanism of 53BP1 -dependent mobility was studied using 
immortalized TRF2'^^'^Cre-ER^^ MEFs expressing an mCherry- 
53BP1 fusion protein that contains the Tudor, UDR, and oligo- 
merization domains of 53BP1 (Figure 1A). This mCherry fusion 
accumulates at DSBs and deprotected telomeres but is neither 
functional nor interferes with the function of the endogenous 
53BP1 (Dimitrova et al., 2008). 

As expected, mCherry-BPI -2 formed foci at the dysfunctional 
telomeres generated by Cre-mediated deletion of TRF2 (Figures 
1 A-1 C), allowing detection of the dynamic behavior of mCherry- 
marked dysfunctional telomeres using 3D time-lapse micro- 
scopy and automated tracing in deconvolved images (Figure 1 D; 
Movie SI A). Since MEF nuclei are flat (2-4 iim in the z direction 
compared to 15-20 [im in x and y), the data were analyzed in 
2D-maximum intensity projected images. 

Although the resulting traces can be corrected for the nuclear 
translocation and rotation (Dimitrova et al., 2008), large-scale 
nuclear deformation, such as expansion, contraction, folding, 
and twisting, also confounds the analysis. We therefore devel- 
oped a standardized method to select nuclei that do not display 
overt distortions. The method is based on three parameters 
(Figure SI) applied to the data after correcting for the translo- 
cation and rotation of the nuclei as described previously (Dimi- 
trova et al., 2008). First, because extensive distortion of a 
nucleus will usually shift the geometrical center (Figure SI A, 
type I; Movie S2A), the maximal movement of the geometrical 
center (MMGC) of the nucleus was evaluated (Figure SIB). 
Second, to identify nuclei undergoing expansion or contrac- 
tions (Figure SI A, type II; Movie S2B), the maximal difference 
between the average distances of the foci from the geometrical 
center (MAAD) was determined (Figure SIC). Third, we identi- 
fied nuclei with groups of foci moving in the same direction, 
which could indicate nuclear folding, twisting, or rotation (Fig- 
ure SI A, type III; Movie S2C). For this determination, the per- 
centage of foci moving in the four different quadrants of the 
XY projections (upper right [UR]; lower right [LR]; upper left 
[UL]; lower left [LL]) was determined (Figures SID and S1E). 
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Sets of foci that show concerted movement will over-populate 
one of these quadrants, allowing detection of nuclei with distor- 
tions. Similarly, over-population of half the space in the projec- 
tions (lateral, vertical, diagonal) was used to detect nuclear 
rotation. Using arbitrarily set thresholds for these parameters 
(Figure S1; see Experimental Procedures), nuclei were dis- 
carded from the analysis. In most experiments, approximately 
half the nuclei passed these selection criteria and were deemed 
to retain their shape. 

Analysis of the selected nuclei showed that dysfunctional telo- 
meres traveled a median cumulative distance of 2.5 iim in 1 0 min 
(Figures 1D and 1E; Table S1; Movie S1A), which is consistent 
with previous data (Dimitrova et al., 2008). The mean square 
displacement (MSD) increased over time, with a final MSD of 
0.3 |im^ after 10 min (Figure IF; Table SI). Fitting of the MSD 



Figure 1. Microtubule Dynamics Promote 
Mobility of Dysfunctional Telomeres 

(A) Schematic of the imaging approach. mCherry- 
BP1-2 foci at deprotected teiomeres after TRF2 
deietion were traced for 10 min by time iapse mi- 
croscopy. 

(B) immunobiot for TRF2 and phosphoryiation of 
Chk2 in TRF2'"/'" RsCre-ER'^^ MEFs at 55-62 hr 
after addition of 4-OH tamoxifen (4-OHT). 

(C) images of mCherry-53BP1 -2 foci with microtu- 
buie visuaiized with YFP-a-tubuiin (with y-correc- 
tion). 

(D) Exampies of traces of mCherry-53BP1-2 
foci as described in (B) and (C) and shown in 
Movies S1A-S1D. 

(E and F) Distribution of the cumuiative distance 
traveied and MSD with SDs of aii the mCherry- 
BP1-2 foci detected in the conditions as (C). Data 
obtained from three independent experiments 
with greater than ten ceiis/condition. Numbers 
beiow the data points are averages and SDs of the 
three median vaiues from three independent 
experiments. Bars represent the median of aii the 
foci (>1 ,000) traced, p vaiues are from two-taiied 
Mann-Whitney test. ****p < 0.0001 , ***p < 0.001 , 
**p < 0.01, *p < 0.05. ns, not significant. 

(G) Percentage of ceiis discarded (means and 
SDs from three independent experiments). The 
p vaiues were based on unpaired t test. Symbois 
as in (F). 

See aiso Figure SI and Tabie SI . 



measured for dysfunctional telomeres to 
MSD = A -I- rt“ showed an anomalous 
diffusion coefficient (a) of close to 1.0 
(Table SI), indicating diffusive motion. 
The calculated diffusion coefficient 
(3.7 X 10“^ |im^/s; Table SI) is in the 
range observed by others for dysfunc- 
tional mammalian telomeres (Chen 
et al., 2013; Cho et al., 2014), DNA dam- 
age lesions formed after UV and IR irradi- 
ation of mammalian cells (Kruhlak et al., 
2006; Falk et al., 2007; Mahen et al., 
2013; Becker et al., 2014), and a locus 
next to an l-Scel induced DSB in yeast (Mine-Hattab and Roth- 
stein, 2012; Dion et al., 2012). 

53BP1 -Dependent Mobility Requires Dynamic 
Microtubules 

We previously showed that the movement of dysfunctional telo- 
meres is not affected by the actin drug, latrunculin A (Dimitrova 
et al., 2008). In contrast, when cells were incubated with the 
microtubule poisons Taxol or nocodazole, which stabilize and 
depolymerize microtubules, respectively (Figure 1C), there was 
a striking reduction in the mobility of the dysfunctional telomeres 
and the distance traveled by the telomeres was significantly 
smaller (Figures 1D-1F; Table SI; Movies S1A-S1C). The effect 
of nocodazole was completely reversed within 1 hr of its removal 
from the media, showing that the lack of dynamic behavior was 
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Figure 2. SUN1 and SUN2 Promote Mobility 
of Dysfunctional Telomeres 

(A) Immunoblots for TRF2, SUN1, SUN2, 53BP1, 
and phosphorylated Chk2 in the indicated MEFs 
at 72 hr after Hit&Run Cre. 

(B) Teiomere dysfunction-induced foci (TiF) assay 
on the MEFs described in (A). Teiomeres were 
detected by FiSH with FiTC-(CCCTAA )3 probe 
(green). Phosphoryiated H2AX (top panei), 53BP1 
(middie panei), and Rif1 (bottom panei) were 
detected by iF (red). DAP I, DNA (biue). 

(C) Quantification of TiF response after Cre as 
assayed in (B). Ceiis with greater than nine TiFs 
were scored. Vaiues are means and SDs of three 
independent experiments, p vaiues were from an 
unpaired t test (see iegend to Figure 1). 

(D) Exampies of traces of mCherry-53BP1 -2 foci at 
66-72 hr after Cre (see Movies S3A-S3C). 

(E and F) Distribution of the cumuiative distance 
traveied and MSDs with SDs of mCherry-BP1-2 
foci in the anaiyzed MEFs (as in D) in four experi- 
ments, as described in Figure 1 . 

See aiso Figure S2 and Tabie S1 . 



not due to a permanent toxic effect of the drug (Figures 1 D-1 F; 
Table S1; Movie S1D). Both microtubule poisons also affected 
the extent to which the nuclei were distorted (Figure 1G; Table 
S1), indicating that much of the nuclear deformation observed 
in these fibroblasts is microtubule-dependent. 

SUN1 and SUN2 Promote the Mobility and NHEJ of 
Dysfunctional Telomeres 

Since the involvement of microtubule dynamics suggested a link 
between the dysfunctional telomeres and the cytoplasm, we 
tested the role of the LING complex in the movement of dysfunc- 
tional telomeres. To this end, we used SUN1 and SUN2 KO mice 
(Ding et al., 2007; Lei et al., 2009) to generate immortalized con- 
ditional TRF2'"'''' SUN1^^ SUN2^'- MEFs. The absence of the 
two SUN proteins did not interfere with Chk2 phosphorylation 
or the formation of telomere dysfunction-induced foci (TIFs) 
containing yH2AX, 53BP1 , and Rif1 after deletion of TRF2 and 



53BP1 was detected by chromatin immu- 
noprecipitation (ChIP) at dysfunctional 
telomeres in SUN1/2 DKO cells (Figures 
2A-2C, S2A, and S2B). Nonetheless, the 
SUN1/2-deficient cells showed a signifi- 
cant reduction in the mobility of the 
dysfunctional telomeres (Figures 2D-2F 
and S2C; Table S1; Movies S3A and 
S3B). The effect of removal of SUN1 
and SUN2 was at least as strong as the 
effect of absence of 53BP1 monitored in 
parallel experiments (Figures 2D-2F; 
Table S1 ; Movies S3C and S3B). 

The percentage of nuclei that were dis- 
carded due to deformation was reduced 
in the absence of SUN1 and SUN2 (Fig- 
ure S2C), implicating the LING complex 
in the microtubule-mediated changes in 
nuclear shape. In contrast, 53BP1 had no effect on nuclear 
deformation (Figure S2G). 

Importantly, in the TRF2 SUN1/2 TKO cells, the diminished 
mobility of the dysfunctional telomeres was accompanied by a 
reduction in their fusion (Figures 3A and 3B). Metaphase spreads 
of cells lacking SUN1 and SUN2 showed a 2-fold decrease in the 
NHEJ of dysfunctional telomeres at 84 hr (Figures 3B and S2B). 
The reduction in telomere fusions was also apparent from the 
diminished appearance of fused telomeric restriction fragments 
(Figures S2D and S2E). The difference in telomere fusion fre- 
quency with and without the SUN proteins was negligible when 
the assay was saturated at a later time point (1 08 hr). In contrast, 
in 53BP1 -deficient cells telomere fusions remained infrequent at 
later time points, consistent with 53BP1 promoting telomere 
fusions through inhibition of resection as well as SUN1/2-depen- 
dent mobility. Since the absence of SUN1 alone affected telo- 
mere fusions less than absence of both SUN1 and SUN2 
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Figure 3. The LINC Complex Promotes 
NHEJ of Dysfunctional Telomeres 

(A) Metaphases showing telomere fusions in the 
indicated MEFs at 84 hr after Hit&Run Cre. Telo- 
meres were detected by FISH with a FITC- 
(CCCTAA )3 probe (green). DMA, DAPI (red). 

(B) Distribution of telomere fusions as in (A) at 84 
and 108 hr after Cre. Dots represent % fusions in 
individual metaphases. Bars represent the median 
of telomere fusions in 15 metaphases for three 
independent experiments (45 metaphases), 
p values from unpaired t test (see legend to Fig- 
ure 1). 

(C) In-gel assay for single-stranded telomeric 
DMA. Telomeric overhangs detected in situ with 
end-labeled ^^P-(AACCCT )4 in Mbol-digested 
genomic DMA from the indicated MEFs at 84 and 
108 hr after TRF2 deletion (top panel). Bottom: the 
DMA was denatured in situ and rehybridized 
with the same probe to determine the total telo- 
mere DMA. 

(D) Quantification of relative overhang signal as 
detected in (C). Values represent means for four 
independent experiments with SDs. The ss telo- 
meric signal was normalized to the total telomeric 
DMA in the same lane. For each MEF line, the 
normalized no Cre value of cells was set at 1 00 and 
the post-Cre values are given relative to this value. 
Two-way ANOVA for multiple comparisons were 
used to perform statistical analysis. For p value 
symbols see legend to Figure 1 . 

(E) Schematic of the LINC complex and microtu- 
bules. 

(F and G) Quantification of telomere fusions in 
TRF2'^^'^ MEFs treated with shRNAs to nesprin-4 or 
Kif5B 96 hr after Cre and analyzed as in (A) and (B). 
Bars represent the median % of telomeres fused in 
three independent experiments (20 metaphases 
each). 

(H) Quantification of telomere fusions in TRF2'^^'^ 
RsCre-ER"^^ and TRF2'"/'" Kif3A'"^'' RsCre-ER"^^ 
MEFs 72 and 90 hr after 4-OHT, as in (A) and (B). 
See also Figures S2 and S3. 



(Figure S2F), we conclude that the SUN proteins have partially 
overlapping functions in this pathway. 

We verified that the deficiency in telomere fusion in the 
SUN1/2 KO was not due to increased resection using a quantita- 
tive assay for the amount of ssTTAGGG repeats after deletion of 
TRF2 from SUN1 /2-deficient celis. in UgA-'- MEFs (TRF2/ 
SUN1/SUN2/Lig4 quadruple KO), which are a good system for 
detection of resection because the telomeres remain free, there 
was no great increase in the overhang signal after TRF2 deletion 
(Figures 3C and 3D), indicating that resection remained 
repressed. Parallel deletion of TRF2 from 53BP1 Lig4“^“ cells 

showed the substantial increase in overhang signal expected 
from the role of 53BP1/Rif1 in repression of resection (Lotters- 
berger et al., 2013). These data, together with the normal locali- 
zation of Rifi at dysfunctional telomeres in SUN1/2 DKO cells 
(Figures 2B and 2C), supports the idea that SUN1 and SUN2 
are dispensable for the protection of DSBs from resection and 



act independently from Rifi. We propose, therefore, that SUN1 
and SUN2 promote the c-NHEJ of telomeres by increasing their 
dynamic behavior. 

Nesprin-4 and Kinesins Contribute to NHEJ of 
Dysfunctional Telomeres 

Since the SUN proteins are connected to the cytoskeleton 
through nesprins (reviewed in Starr and Fridolfsson, 2010) (see 
Figure 3E) and SUN1/2-deficient cells lack nesprin-1, nesprin- 
2, nesprin-3, and nesprin-4 at the NE (Crisp et al., 2006; Padma- 
kumar et al., 2005; Ketema et al., 2007; Lei et al., 2009; Roux 
et al., 2009), we tested shRNAs to nesprins for an effect on 
NHEJ of dysfunctional telomeres. Two shRNAs targeting 
nesprin-4 lowered the frequency of telomere fusions without 
affecting cell proliferation or the DDR (Figures 3F, S3A, and S3B). 

As nesprin-4 is known to interact with the plus-end directed 
microtubule motor kinesin-1 (Figure 3E), we tested shRNAs to 
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the Kif5B subunit of kinesin-1 for an effect on the NHEJ of 
dysfunctional telomeres. Two shRNAs to Kif5B lowered the fre- 
quency of telomere fusions at an early time point without 
affecting the proliferation or the DDR upon telomere deprotec- 
tion (Figures 3G, S3C, and S3D). In addition, two shRNAs to 
the kinesin-2 subunit Kif3A resulted in a reduced frequency 
of telomere fusions (Figures S4E-S4G). Since kinesin-2 had 
not previously been shown to cooperate with nesprin-4, we 
generated TRF2'"'''Kif3A''''' MEFs to further verify the shRNA 
data. Consistent with the shRNA results, MEFs lacking Kif3A 
showed a significant reduction in the efficiency of telomere 
fusions after TRF2 deletion (Figures 3G, S3H, and S3I). These 
data suggest that 53BP1 -mediated mobility of dysfunctional 
telomeres likely involves redundant action by the microtubule 
motors kinesin-1 and kinesin-2, as well as nesprin-4 and 
possibly other nesprins. 

Phosphorylation Sites in 53BP1 Required for Telomere 
Mobility 

As a version of 53BP1 lacking its N-terminal SATQ sites (53BP1- 
28A) fails to induce chromatin mobility (Lottersberger et al., 
2013), we determined which S/TQ sites are involved in this pro- 
cess. We generated a collection of S or T to A mutations at the 
SATQ positions in a C-terminally truncated version of 53BP1 
that lacks the BRCT domain (53BP1DB; Figure 4A) (Bothmer 
et al., 2011) and behaves like wild-type 53BP1 in the context 
studied here (Lottersberger et al., 2013; Zimmermann et al., 
2013). Through the analysis of the mutants, we identified one 
mutant, referred to as 53BP1AMOB, which appeared to be a 
separation-of-function mutant specifically deficient in the ability 
of 53BP1 to promote mobility but proficient in blocking resection 
(Figure 4A). 53BP1 AMOB recruited Rifi to sites of DNA damage 
and was able to interact with PTIP, which was expected since the 
region of mutated S/TQ sites falls outside the previously mapped 
Rifi and PTIP interacting regions (Figures 4A and S4A-S4C) (Mu- 
noz et al., 2007; Escribano-Diaz et al., 2013). Consistent with its 
binding to Rifi, the 53BP1AMOB mutant was proficient in re- 
pressing hyper-resection at telomeres after TRF2 deletion in 
TRF2'"''' 53BP1 Lig4“^“ cells (Figures S4D and S4E). 

Despite the normal interactions with Rifi and PTIP, the ability 
of 53BP1 AMOB to promote telomere fusions upon complemen- 
tation of 53BP1 deficiency was significantly reduced (Figure 4B). 
However, 53BP1AMOB promoted telomere fusion similar to 
53BP1DB in SUNI^'^ SUN2^^- 53BP1^^- cells (Figures 4B 
and S4F), suggesting that the 53BP1AMOB is only deficient in 
a function that requires SUN1/2. Time-lapse imaging showed 
that 53BP1AMOB is completely defective in promoting the 
increased mobility of dysfunctional telomeres resulting in dy- 
namics that are indistinguishable from cells transduced with 
the empty vector or the 53BP1A28A mutant (Figures 4C and 
S4G; Table SI). In contrast, 53BP1APTIP showed no defect in 
promoting mobility of dysfunctional telomeres (Figures 4C and 
S4G; Table SI). Thus, the ability of 53BP1 to promote mobility 
of dysfunctional telomeres likely involves an interaction that 
depends on phosphorylation of one or more of the ST/Q sites 
in the MOB domain. The identity of the MOB domain interacting 
partner is unknown. It is not excluded that this domain interacts 
with SUN1 and SUN2 but this interaction was not detected by 



mass spectrometry (Di Virgilio et al., 2013) and OhIP failed to 
reveal SUN1 and SUN2 at dysfunctional telomeres (Figure S2A). 

PTIP Is Not Required for 53BP1 -Dependent Mobility 

To determine whether PTIP contributes to the 53BP1 -dependent 
mobility, TRF2 and PTIP co-deletion in SV40LT immortalized 
TRF2'^^'^ PTIP*^^*^ MEFs was analyzed. Absence of PTIP did not 
affect cell proliferation or the DDR at the dysfunctional telomeres 
(Figures S5A-S5D). In the PTIP-deficient setting, the distances 
traveled and MSD of the dysfunctional telomeres was equal to 
that of PTIP containing control cells (Figures 4D, 4F, and S5E; 
Table SI ; Movies S4A-S4C). Moreover, the analysis of the telo- 
mere overhangs showed that PTIP deficiency did not affect the 
resection at dysfunctional telomeres (Figures S5F and S5G), 
supporting the previous conclusion that 53BP1 -dependent pro- 
tection from resection is primarily dependent on Rifi (Zimmer- 
mann and de Lange, 2014). Nonetheless, as previously shown 
(Callen et al., 2013), telomere fusions appeared slightly delayed 
when PTIP was deleted (Figures 4G, S5F, and S5G). Consistent 
with these results, the 53BP1APTIP mutant displayed a mild 
defect in promoting telomere fusions but appeared unaffected 
with regard to protection from resection and the induction of 
mobility (Figures 4B, 4C, S4D, and S4E). 

53BP1/LINC/Microtubule-Dependent Mobility of IR- 
Induced DSBs 

Despite their resemblance to DSBs, dysfunctional telomeres 
could be argued to be different from chromosome-internal 
DNA breaks. We therefore tested whether genome-wide DSBs 
are subject to the 53BP1/LINC/microtubule-dependent changes 
in dynamics. To this end, we analyzed the mobility of the 
mCherry-BP1-2 foci after induction of ~100 DSBs with 2.75 Gy 
IR (Rothkamm and Lobrich, 2003) in wild-type, 
SUN1-'-SUN2-^“, and 53BPr'- MEFs. As expected, the 
Chk2 phosphorylation and formation of y-H2AX foci were not 
affected by the genotype of the cells (Figures 5A-5C). The IR- 
induced mCherry-53BP1-2 foci showed a cumulative distance 
traveled and an MSD comparable to the MSD of dysfunctional 
telomeres. This dynamic behavior was strongly diminished in 
absence of 53BP1 or the SUN proteins and upon treatment 
with Taxol (Figures 5D-5G; Table SI ; Movies S5A-S5D). There- 
fore, we conclude that the 53BP1/LINC/microtubules pathway 
promotes the mobility of chromosome-internal DSBs as it does 
at dysfunctional telomeres. 

Undamaged Chromatin Is Minimally Affected by DSBs 

We next asked whether the presence of mobile DSBs changes 
the dynamics of the global chromatin. To address this question, 
we monitored the mobility of fully functional telomeres, marked 
with eGFP-TRFI in cells with and without IR-induced DSBs. 
The IR was delivered at 2.75 Gy, which induces ~1 DSB/60 
Mb (~100 DSBs per cell, see above). Since the 80 telomeres of 
the mouse genome represent ~4 Mb (~0.1% of the genome), 
telomeres are not expected to contain DSBs after 2.75 Gy. 
Nonetheless, the eGFP-marked telomeres showed a very slight 
but statistically significant increase in the cumulative distance 
traveled (Figure S6). Moreover, their MSD and diffusion coeffi- 
cient were slightly increased, although much less than when 
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Figure 4. The Mobility Domain of 53BP1, but Not PTIP, Is Required for Mobility of Dysfunctional Telomeres 

(A) Schematic of 53BP1 , S/TQ site mutations, and their phenotypes. 

(B) Quantification of teiomere fusions in the indicated MEFs compiemented with the indicated 53BP1 aiieies 96 hr after TRF2 deietion with Hit&Run Cre (as in 
Figure 3). Data from >70 metaphases anaiyzed in four independent experiments. For each experiment, the median fusion frequency for 53BP1 DB was set to 1 00 
and aii other vaiues were normaiized to this frequency. 

(C) MSDs with SDs of mCherry-BP1-2 foci detected in the TRF2-deieted 53BP1“^“ RsCre-ER^'' MEFs expressing the indicated 53BP1 aiieies. Data from three 
independent experiments. 

(D) Exampies of traces of mCherry-53BP1-2 foci at 66-72 hr after Cre in the indicated MEFs (see Movies S4A-S4C). 

(E and F) Distribution of the cumuiative distance traveied and MSDs with SEMs of mCherry-BP1-2 foci in the indicated MEFs (as in Figure 1). Bars represent 
medians of the cumuiative distance traveied by >500 foci in two experiments and numbers indicate the averages and SEMs of the two median vaiues obtained in 
two independent experiments. 

(G) Quantification of teiomere fusions in the indicated MEFs at 84 and 108 hr after Cre (as in Figure 3). 

See aiso Figures S4 and S5 and Tabie SI . 
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the telomeres were dysfunctional (Figure S6; Table S1; Movies 
S6A-S6D). These results indicate that while the chromatin 
dynamics primarily affects sites of DNA damage, there is also 
a minor increase in the mobility of undamaged chromatin, 
consistent with a previous report (Zidovska et al., 2013). 

When the eGFP-TRFI marker was used to detect nuclear de- 
formations, the incidence of distorted nuclei was not affected by 
deletion of TRF2 (Figure SOB; Table SI), indicating that microtu- 
bule dynamics distort nuclei regardless of the presence of DNA 
damage. 

Chromatin Mobility Promotes DSB Mis-repair in BRCA1- 
Deficient Cells 

We considered that for genome-wide DSBs, the increased 
mobility of the chromatin could promote the joining of unrepaired 



Figure 5. 53BP1/LINC/Mlcrotubule-Pro- 
moted Mobility of IR-Induced DSBs 

(A) Immunoblot for phosphorylation of Chk2 (as in 
Figure 2A) in the indicated MEFs at 1 hr after 
2.75 Gy IR. 

(B) IF for yH 2AX (green) and 53BP1 (red) for cells 
treated as in (A). DAPI, DNA (blue). 

(C) Quantification of IR-induced y-H2AX and 
53BP1 foci as assayed in (B). 

(D) Examples of 10 min traces of mCherry- 
53BP1 -2 foci at 1 hr after IR of the cells described 
in (A) with or without 20 ^iM Taxol (see Movies 
S5A-S5D). 

(E-G) Percentage of cells discarded, distribution 
of the cumulative distance traveled, and MSDs 
with SDs of mCherry-BP1-2 foci detected as (D) 
and (E) (as in Figure 1). Data from three indepen- 
dent experiments. 

See also Figure S6 and Table SI . 



DNA ends that are at a distance. One 
setting in which this process may be rele- 
vant is the formation of radial chromo- 
somes in PARPi-treated BRCA1 -deficient 
cells. Radial formation involves the joining 
of a DNA end from one chromosome with 
a break in another chromosome, which 
may be at a distance and therefore would 
require spatial exploration for joining. We 
therefore tested whether the 53BP1- 
dependent mobility contributes to the 
mis-rejoining when many S phase DSBs 
are induced with PARPi and HDR is 
impaired. Experiments with cells contain- 
ing fluorescently labeled geminin to reveal 
their cell-cycle stage showed that IR- 
induced DSBs become mobile in S/G2 
as well as in G1 (Figures S7A-S7D). 

As previously shown, when BRCA1 
shRNA-treated cells were incubated 
with the PARP inhibitor olaparib, a signif- 
icant number of mis-rejoined chromo- 
somes was formed and this phenotype 
was repressed by deletion of 53BP1 (Figures 6A-6C). Impor- 
tantly, SUN1“^“SUN2“^“ MEFs also diminished the formation 
of aberrantly repaired chromosomes (Figures 6A-6C) and the 
mis-rejoining events were strongly reduced by Taxol (Figure 6D). 
The effect of Taxol was not due to diminished PARP inhibition, 
since PARPiATaxol-treated cells showed no parsylation in 
response to H 2 O 2 (Figure S7E). Taxol did not further reduce 
either the mobility or the chromosome mis-rejoining events in 
absence of SUN1 and SUN2 (Figures 6E and S7F-S7H; Table 
SI), supporting the view that the SUN proteins and microtubules 
act in the same pathway to promote chromatin mobility and 
aberrant DNA repair. Importantly, SUN1/2 deficiency also dimin- 
ished the lethality of PARPi treatment in BRCA1 -deficient cells 
(Figures 6F and S7I). As expected, the absence of 53BP1 
rescued the lethality of PARPi treatment to a greater extent. 
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Figure 6. SUN1/2 and Dynamic Microtu- 
bules Promote Radial Formation 

(A) Immunoblots for BRCA1 and y-tubulin in the 
indicated MEFs (as in Figure 2A) at 144 hr after 
infection with BRCA1 shRNA or empty vector. 
Olaparib was added 16 hr before analysis. 

(B) Representative mis-rejoined chromosomes 
(arrowheads). DNA stained with DAPI. 

(C) Quantification of mis-rejoined chromosomes in 
the indicated MEFs (as in A), analyzed as in (B). 
Each dot represents a metaphase. Bars represent 
the median of mis-rejoined chromosomes in three 
independent experiments (10 metaphases each), 
p values as in Figures 1 A and 3B. 

(D) Quantification of mis-rejoined chromosomes 
in the indicated MEFs 18 hr with or without Taxol 
as in (C). 

(E) Quantification of mis-rejoined chromosomes in 
each metaphase in the indicated MEFs with or 
without Taxol as described in (C) and (D). All cells 
used in (A)-(F) are TRF2^/^ 

(F) Quantification of colony formation in the indi- 
cated cells infected with BRCA1 shRNA and 
treated with or without olaparib for 7 days. The 
curves represent the average and SEMs of two 
independent experiments. 

(G) Schematic of the role of 53BP1 in NHEJ of 
distant DSBs. In addition to controlling of DNA end 
processing, 53BP1 can affect NHEJ by increasing 
the mobility of DSBs. The mobility of DSBs is 
dependent on the LING complex and microtubule 
dynamics. Dashed arrows indicate the possibility 
that the DDR affects the LING complex and 
microtubules independent of 53BP1 . 

See also Figure S7. 




-53BP1 



LINC complex 
Microtubules 



Chromatin 

Mobility 

I 

NHEJ 



consistent with the multiple mechanisms by which 53BP1 affects 
DSB repair (Figure 6G). 

DISCUSSION 

These results establish that DSBs show altered dynamic behavior 
in mammalian nuclei. The mobility and roaming of damaged chro- 
matin requires the MOB domain in 53BP1, the SUN1/2 compo- 
nents of the LINC complex, and dynamic microtubules. In 
addition, data on telomere fusions implicated plus-end directed 
microtubule motors (kinesin-1 and kinesin-2) and at least one of 
the nesprin proteins in this process. The LINC complex contrib- 
utes to the dynamic behavior of specific chromosomal loci, 
including telomeres, during bouquet formation in many eukary- 
otes (reviewed in Shibuya and Watanabe, 2014). However, the 
process acting on DSBs is different from bouquet formation. 
While the bouquet configuration bundles loci at one area of the 




NE in preparation for meiosis I, the DSB 
mobility recorded here is not overtly asso- 
ciated with clustering or NE targeting. 

In the experimental settings analyzed 
here, the spatial exploration of DSBs 
promotes their pathological joining 
by NHEJ. DSB mobility enhanced 
telomere-telomere fusions forming 
dangerous dicentric chromosomes and similarly, it promoted 
the mis-repair of PARPi-induced DSBs generating lethal radial 
chromosomes. Given these fatal outcomes, a major question is 
why this pathway is allowed to act on DSBs. Below, we pro- 
pose that the enhanced mobility of DSBs represents a mech- 
anism to restore the connection between DNA ends that 
have lost their proper interaction. We argue that this mecha- 
nism can counteract ectopic repair when DSBs are rare, as 
is the case under physiological conditions. On the other 
hand, DSB mobility will promote mis-repair under experimental 
conditions when a high number of DSBs are generated at the 
same time. 



How DSB Mobility Could Prevent Repair Errors in G1 
and S/G2 

It is reasonable to assume that 53BP1 did not evolve to promote 
the fusion of dysfunctional telomeres and mis-repair of DSBs in 
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PARPi-treated BRCA1 -deficient cells. Instead, we propose that 
53BP1 has gained the ability to promote DSB mobility to facili- 
tate correct repair (Figure 7). We imagine two settings where 
increased chromatin mobility at a DSB would be advantageous. 
The first setting is in G1 when a DSB is formed and its repair by 
Ku70/80-dependent c-NHEJ is the preferred mechanism to re- 
establish the integrity of the genome (Figure 7A). If Ku loading 
fails or synapsis does not occur, the DNA ends might become 
spatially separated. For instance, chromatin-remodeling and 
nucleosome eviction at DSBs (reviewed in Peterson and Al- 
mouzni, 2013) may drive the two DNA ends apart. If the sepa- 
rated ends are mobile, their increased spatial exploration could 
reconnect them and promote their joining. 

The second setting in which mobility of damaged chromatin 
could prevent repair errors is after DNA replication (Figure 7B). 



Figure 7. Proposed Function and Mecha- 
nism of 53BP1 -Dependent Mobility of DSBs 

(A and B) Proposed function for 53BP1 -dependent 
mobility in promoting correct DSB repair. (A) G1 : 
mobility of DNA ends that have lost their associ- 
ation could promote their rejoining, thereby pro- 
moting NHEJ. (B) S/G2: if a DNA end loses 
connection with the sister chromatid and invades 
an ectopic locus, DSB mobility could disrupt this 
aberrant interaction and promote correct HDR. If 
the DSB is being repaired correctly using HDR on 
the sister chromatid, mobility will not dissociate 
the ends because of the presence of cohesin and 
base-pairing. 

(C) Proposed models for the mechanism of 
53BP1/LINC/microtubule-dependent mobility of 
DSBs. The enlarged part of the nucleus shows 
53BP1 (red) at a DSB with the ends separated. 
One end (top) portrays a model in which 53BP1 
has a physical connection with the LING complex 
(green). The LING complex connects to dynamic 
microtubules and thereby moves the LINC-bound 
53BP1 -covered DNA end. The other end (bottom) 
portrays a model in which there is no physical 
connection between the LING complex and 
53BP1. The LING complex associates with mi- 
crotubules that “poke” the nucleus. The 53BP1- 
associated chromatin moves more readily even 
when not at the periphery, perhaps because 
53BP1 alters the flexibility of the chromatin fiber. 
See text for discussion. 



In S/G2, DSBs can be repaired by HDR 
using the sister chromatid as the tem- 
plate. However, if the DNA topology is un- 
favorable, one DNA end (or both) could 
lose its attachment to the sister chro- 
matid and initiate ectopic repair on a 
different locus (Figure 7B). Mobility of 
the chromatin near the DSB could help 
to disconnect the wandering DNA end 
from an ectopic locus where it is not 
held down by cohesin and where base- 
pairing will be limited. In contrast, chro- 
matin mobility of DSBs is less likely to 
interrupt HDR on the sister because of the stabilizing effects of 
cohesion and base-pairing. 

The proposed role of DSB mobility in counteracting ectopic in- 
teractions is analogous to what has been proposed for the 
mobility of the chromosome pairing centers in Caenorhabditis 
elegans meiosis (Sato et al., 2009). Sato et al. (2009) argued 
that this process preferentially disrupt pairing of non-homolo- 
gous chromosomes since paired homologs will have a greater 
ability to resist forces. Although the system described here is 
different from the meiotic events, both regulatory pathways 
may have evolved to provide a mechanism aimed to distinguish 
weak non-homologous interactions from the stronger connec- 
tion afforded by homology. 

A key consideration with regard to the role of 53BP1 in DSB 
repair is that the mammalian DDR did not evolve to handle 
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hundreds of DSBs occurring at the same time. In vivo, the ma- 
jority of cells in primate brain and liver show no evidence of 
DSBs and only 10% of the cells have one or two 53BP1 foci 
(Fumagalli et al., 2012); U. Herbig, personal communication), 
indicating that the occurrence of multiple DSBs in one nucleus 
is rare in post-mitotic tissues. Furthermore, in MEFs that are in 
S phase, where DSBs are expected to be more frequent, <20% 
of the nuclei have five or more 53BP1 foci and none showed 
more than ten (Wu et al., 2010). This number of potential 
S phase DSBs may be an overestimate because 53BP1 foci 
can form at a variety of DNA lesions. These observations argue 
that the 53BP1 -mediated mobility of DSBs is unlikely to cause 
chromosomal aberrations unless cells experience an exoge- 
nous genotoxic insult. 

Models for the Mechanism by Which DSB Mobility Is 
Generated 

We are considering two general types of models for how 53BP1 , 
the LING complex, and microtubules promote mobility (Fig- 
ure 7C). In the first model, there is a physical connection between 
the 53BP1 -marked chromatin and a LING complex that interacts 
with microtubules. In the second model, no such connection 
exists. 

Although we have not been able to establish a physical inter- 
action between 53BP1 and the SUN proteins, it is not excluded 
that 53BP1 directs DSBs to the LING complex. If 53BP1 inter- 
acts with the LING complex, kinesin- and microtubule-depen- 
dent mobility of the LING complex could alter the dynamic 
behavior of DSBs. The lack of clear peripheral localization of 
DSBs is not a strong argument against this model since the 
nuclei we have studied are flat, positioning most of the chro- 
matin fairly close to the NE. Furthermore, NE invaginations 
could allow a connection of a non-peripheral DSB with the 
LING complex. We note that the recorded trajectories and the 
diffusive behavior of DSBs gleaned from the MSD curves argue 
against the direct interaction model. However, if the engage- 
ment is short-lived and takes place in iterative rapid steps, 
the outcome may resemble diffusive behavior rather directed 
movement. 

Nonetheless, we favor a second type of model in which no 
physical connection occurs between 53BP1 and the LING 
complex. In this model, the role of the LING complex is to 
transduce microtubule forces onto the chromatin in an untar- 
geted manner. This process may be analogous to the micro- 
tubule-mediated fenestration of the nuclear envelope in 
prophase, which is in part mediated by the SUN proteins 
(Turgay et al., 2014). Random “poking” of the nucleus in 
response to DNA damage would explain why the global chro- 
matin becomes slightly more dynamic in cells with DSBs but 
how this process is activated by the DNA damage response 
remains to be determined. It is also unclear whether the 
visco-elastic properties of chromatin and the resistance of 
the lamin network allow force propagation over the required 
distance. 

How could microtubule forces specifically increase the 
mobility of DNA damaged loci in absence of a connection be- 
tween 53BP1 and the LING complex? The simplest explanation 
would be that 53BP1 , through a factor that binds to the MOB 



domain, changes the flexibility of the chromatin fiber containing 
the DSB. Increased flexibility of the large chromatin domain con- 
taining 53BP1 could render it more sensitive to the microtubule 
forces transduced through the NE. Indeed, chromatin that con- 
tains DSBs shows a decreased density as determined by EM 
and appears to expand (Kruhlak et al., 2006), attributes that 
could be consistent with a change in the flexibility of the chro- 
matin fibers. 

Implications 

This study revealed that mammalian cells use microtubules in 
the cytoplasm to promote the mobility of sites of DNA damage 
in the nucleus. Although some of the molecular details of this 
process remain to be determined, the main players, including 
the MOB domain of 53BP1, the LING complex, kinesins, and 
microtubules are now known, allowing further investigation. 
The results show that in cells with many DSBs, the induced 
mobility of the damaged chromatin can promote aberrant 
DSB repair events, including the fusion of dysfunctional telo- 
meres and formation of radial chromosomes in PARPi-treated 
BRCA1 -deficient cells. Two main issues warrant attention in 
the near future. First, one prediction from our findings is that 
curbing microtubule dynamics with taxanes might limit the 
efficacy of PARPi-treatment of HR-deficient cancers. Thus, 
when a combination of taxanes with olaparib or other DNA- 
damaging agents (e.g., platin drugs) is being considered, the 
effect of taxanes on the efficacy of genotoxic drugs merits 
further testing. Second, it will be of interest to test our proposal 
that the 53BP1 -dependent mobility of DSBs can prevent DNA 
repair errors under normal physiological settings when DSBs 
are rare. 

EXPERIMENTAL PROCEDURES 

Live-Cell Imaging and Identification of Distorted Nuciei 

Dysfunctional telomeres were visualized using mCherry-BP1-2 as described 
previously (Dimitrova et al., 2008). Images were deconvolved and 2D- 
maximum intensity projection images were obtained using SoftWoRx soft- 
ware. Tracking of mCherry-BP1-2 foci was performed with Imaged software 
on at least ten cells per condition. Cells were registered by the StackReg plugin 
using Rigid Body (Thevenaz et al., 1998) and particles were tracked using the 
Mosaic Particle Detector and Tracker plugin (Sbalzarini and Koumoutsakos, 
2005) with the following parameters for particle detection and tracking: 
radius = 1-2 pixels; cutoff =1-2 pixels; percentile = 6; link range = 1 ; displace- 
ment = 5 pixels. The x and y coordinates of each trajectory were used for 
further calculation. All mCherry-BP1-2 foci in a cell that were continuously 
tracked for at least 19 out of 20 frames were analyzed. The analysis of 
the eGFP-TRFI -marked telomeres was similarly conducted using the 
following parameters: radius = 1 pixel; cutoff = 1 pixel; percentile = 8-12; 
link range = 1 ; displacement = 5 pixels. 

The average x and y values of all the foci was calculated in each frame as the 
geometrical center (GC) and normalized over the GCt=o- The distance traveled 
by the GC between each time points t = b and t = a was calculated as move- 
ment of geometrical center 

and the maximal MGC (MMGC) for each cell was identified. Cells were dis- 
carded if their MMGC exceeded the arbitrary threshold of 2, or if their 
MMGC exceeded the secondary threshold of 1 and another parameter was 
also above threshold. 
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and the maximai AAD (MAAD) for each ceii was identified. Ceiis were dis- 
carded if MAAD exceeded the arbitrary threshoid of 2, or if MAAD exceeded 
the secondary threshoid of 1 and another parameter was aiso above threshoid. 

Finaiiy, the trajectories traveied by each focus / per ceii, reiativeiy to the GC, 
were normaiized to the coordinates x't=o and y't=o and projected together on a 
XY piane. The percentage of foci in each quadrant was caicuiated for each time 
frame: upper right (UR(%)), iower right (LR(%)), upper ieft (UL(%)), iower ieft 
(LL(%)) and the average of these vaiues during the time iapse was derived. Lat- 
eraiity (LAT (%)), verticaiity (VER (%)), and diagonaiity (DiA (%)) were caicuiated 
for each time frame as: 



were performed using previousiy pubiished standard procedures. The 
mutated 53BP1 aiieies were as foiiows: 53BP1APTiP (S6A, S13A, S25A, 
S29A) and 53BP1AMOB (S674A, T696A, S698A, S784A, S831A, T855A, 
S892A, S1068A, S1086A, S1104A, T1148A, S1171A, S1219A). 

Detaiied experimentai procedures are given in the Supplemental Experi- 
mental Procedures. 
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MSD(Af)=- X ^D/(Af)^ 

^ ; = 1 

where 

^ ((x' -xp'=) - (x'_i, -x®i,))^ + ((y; -ypC) - - yf^jf ■ 

All data output in pixels (standard Imaged output) were converted to meters 
by the formula, 1 pixel = 0.215 ^im, based on the characteristics of the 
objective. 

Diffusion coefficient D was calculated as 
D = m/4, 

where m is the slope of the MSD after fitting to a linear curve. The anomalous 
diffusion coefficient a was derived using MATLAB by the fitting of MSD to the 
diffusion model function: 

MSD^A+rr. 

For cumulative distance, statistical analysis was performed using Prism 
Software applying the Mann-Whitney test. 

Other Experimental Procedures 

All procedures for derivation of MEFs, cell treatments, plasmids, shRNAs, 
immunoblotting, IF, IF-FISH, analysis of metaphase chromosomes, in-gel 
analysis of telomeric DNA, co-immunoprecipitation, ChIP, and mutagenesis 
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SUMMARY 

A deficiency in pejvakin, a protein of unknown func- 
tion, causes a strikingly heterogeneous form of hu- 
man deafness. Pejvakin-deficient (Pjvk~^~) mice 
also exhibit variable auditory phenotypes. Correla- 
tion between their hearing thresholds and the num- 
ber of pups per cage suggest a possible harmful 
effect of pup vocalizations. Direct sound or electrical 
stimulation show that the cochlear sensory hair cells 
and auditory pathway neurons of Pjyk~'~ mice and 
patients are exceptionally vulnerable to sound. Sub- 
cellular analysis revealed that pejvakin is associated 
with peroxisomes and required for their oxidative- 
stress-induced proliferation. P]vk~'~ cochleas dis- 
play features of marked oxidative stress and im- 
paired antioxidant defenses, and peroxisomes in 
P\vk~'~ hair cells show structural abnormalities 
after the onset of hearing. Noise exposure rapidly 
upregulates Pjvk cochlear transcription in wild- 
type mice and triggers peroxisome proliferation in 
hair cells and primary auditory neurons. Our results 
reveal that the antioxidant activity of peroxisomes 
protects the auditory system against noise-induced 
damage. 
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INTRODUCTION 

Mutations of PJVK, which encodes pejvakin, a protein of un- 
known function present only in vertebrates, cause the DFNB59- 
recessive form of sensorineural hearing impairment. In the first 
patients described (Delmaghani et al., 2006), the impairment 
was restricted to neurons of the auditory pathway, with auditory 
brainstem responses (ABRs) displaying abnormally decreased 
wave amplitudes and increased inter-wave latencies (Starr and 
Ranee, 2015). ABRs monitor the electrical response of auditory 
pathways to brief sound stimuli, from the primary auditory neu- 
rons synapsing with the sensory cells of the cochlea, the inner 
hair cells (IHCs), to the colliculus in the midbrain (Moller and Jan- 
netta, 1983). However, some DFNB59 patients were found to 
have a cochlear dysfunction, as shown by an absence of the 
otoacoustic emissions (OAEs) that are produced by the outer 
hair cells (OHCs), frequency-tuned cells endowed with electro- 
motility that mechanically amplify the sound stimulation of neigh- 
boring IHCs (Ashmore, 2008). These patients had truncating 
mutations of PJVK, whereas the previously identified patients, 
with extant OAEs, had missense mutations (p.T54l or p.R183W) 
(Ebermann et al., 2007; Schwander et al., 2007; Borck et al., 
2012). However, the identification of patients also carrying the 
P.R183W missense mutation but lacking OAEs (Collin et al., 
2007) refuted any straightforward connection between the nature 
of the PJVK mutation and the hearing phenotype. The severity of 
deafness in DFNB59 patients varies from moderate to profound 
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Figure 1. Hearing Loss Variability and Greater Sensitivity to Con- 
trolled Sound Exposure in Pjvk~^~ Mice 

(A) ABR thresholds at 10 kHz in P30 P\vk^'^ (n = 26 mice) and Piv\c'- (n = 48 
mice) iittermates. 

(B) DPOAE threshoids at 1 0 kHz in P30 P\v\<^'^ (n = 1 4 mice) and P\v\c'- (n = 48 
mice) iittermates. In ears with no DPOAE, even at 75 dB SPL (the highest sound 
intensity tested), DPOAE thresholds were arbitrarily set at 80 dB SPL. 

(0) Relationship between the number of pups raised together (determining 
sound levels in the immediate environment) and ABR thresholds at 10 kHz in 
P21 P\v\c'~ pups. Inset: a time-frequency analysis of a mouse pup’s vocali- 
zation. Pup calls from PO to P21 form harmonic series of about 5 kHz, with the 
most energetic harmonic at about 10 kHz. In a 12-pup litter, call levels reach 
105 ± 5 dB SPL at the entrance to the ear canals of the pups. 

(D) ABR thresholds at 1 0 kHz in P30 P\v\<^'^ and P\v\c'~ mice before (dots) and 
after (crosses) controlled sound exposure, ns, not significant; ***p < 0.001 . 
See also Figure SI . 



and may even be progressive in some patients, suggesting that 
extrinsic factors may influence the hearing phenotype. 

We investigated the role of pejvakin, with the aim of determining 
the origin of the phenotypic variability of the DFNB59 form of deaf- 
ness. Our study of PJvk knockout mouse models and of patients 
revealed an unprecedented hypervulnerability of auditory hair 
cells and neurons to sound exposure, accounting for phenotypic 
variability. We found that pejvakin is a peroxisome-associated 
protein involved in the oxidative-stress-induced proliferation of 
this organelle. Pejvakin-deficient mice revealed the key role of per- 
oxisomes in the redox homeostasis of the auditory system and in 
the protection against noise-induced hearing loss. 

RESULTS 

Heterogeneity in the Hearing Sensitivity of Pjyk~'~ Mice 

We generated pejvakin-null (PJvk~^~) mice carrying a deletion of 
PJvk exon 2, resulting in a frameshift at codon position 71 



(p.Gly71/s*9) (Figure SI; see the Supplemental Experimental 
Procedures). ABR thresholds recorded on postnatal day 30 
(P30) Pjvk~^~ mice (n = 48) ranged from 35 to 11 0 dB SPL (sound 
pressure level) at 10 kFIz but never exceeded 30 dB SPL in their 
PJvk^'^ Iittermates (n = 26) (Figure 1 A). This broad range of hear- 
ing sensitivity in P\vk~'~ mice, from near-normal hearing to 
almost complete deafness, extended across the whole fre- 
quency spectrum. The thresholds of distortion-product OAEs 
(DPOAEs) at 10 kFIz (i.e., the minimum stimulus required for 
DPOAEs production by OHCs) also fell within an abnormally 
large range of values, from 30 to 75 dB SPL, in 28 P\vk~'~ 
mice, indicating an OHC dysfunction, and DPOAEs were unde- 
tectable in another 20 PJvk~'~ mice, suggesting a complete 
OHC defect (Figure 1 B). The absence of pejvakin in mice thus re- 
sults in a puzzlingly large degree of hearing phenotype variability. 

Hypervulnerability to the Naturai Acoustic Environment 
in PJvk~^~ Mice 

We investigated the variability of Pjvk~^~ auditory phenotypes, 
by first determining the ABR thresholds of Pjvk~^~ Iittermates 
from different crosses. Large differences were observed be- 
tween crosses, with much fewer differences between the Pjvk~^~ 
Iittermates of individual crosses. Litters with larger numbers of 
pups (6 to 12) had higher ABR thresholds, suggesting that the 
natural acoustic environment, with the calls of larger numbers 
of pups, might be deleterious in Pjvk~^~ mice. Pups are vocally 
very active from birth to about P20. We manipulated the level 
of exposure to pup calls by randomly splitting large litters of 
Pjvk~^~ pups into groups of 2, 4, 6 and 10 pups per cage, with 
foster mothers, before P10, i.e., several days before hearing 
onset. The ABR thresholds at P21 were significantly correlated 
with the number of pups raised together (p < 0.001, r^ = 0.51) 
(Figure 1C). 

We then evaluated the effect of a controlled sound stimulation 
on hearing, by presenting 1,000 tone bursts at 10 kHz, 105 dB 
SPL (2-ms plateau stimulations separated by 60-ms intervals 
of silence), energetically equivalent to a 3-min stay in the natural 
environment of a 1 2-pup litter, while monitoring the ABRs during 
sound exposure. These conditions are referred to hereafter as 
“controlled sound exposure.” We probed the effect of sound 
exposure by ABR tests, which, limited to 50 repetitions of tone 
bursts, did not influence the hearing thresholds of Pjvk~^~ 
mice. In a sample of P30 Pjvk~^~ mice with initial ABR threshold 
elevation (below 35 dB SPL), controlled sound exposure affected 
ABR thresholds in the 1 2-20 kHz frequency interval (correspond- 
ing to the cochlear zones in which hair-cell stimulation was 
strongest), with an immediate increase of 21.7 ± 10.3 dB (n = 
8; p < 0.001), not observed in Pjvk^^^ mice (2.2 ± 2.4 dB, n = 
12; p = 0.3) (Figure ID). Pjvk~^~ mice transferred to a silent 
environment after exposure displayed a further increase of 
33.7 ± 16.0 dB (n = 8) 2 days after exposure. The threshold shift 
decreased to 23.7 ± 18.0 dB at 7 days, and disappeared entirely 
by 14 days. When exposed mice were returned to the box with 
their Iittermates, their ABR continued to increase, at a rate of 
15 dB per week. Pejvakin deficiency thus results in particularly 
high levels of vulnerability to low levels of acoustic energy, and 
the increase in ABR thresholds is reversible but only slowly 
and in a quiet environment. 
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Hair Cells and Auditory Pathway Neurons Are Affected 
by Pejvakin Deficiency 

To identify the cellular targets of the pejvakin deficiency, we spe- 
cifically probed the function of auditory hair cells and neurons in 
Pjvk~^~, hair cell-conditional Pjvk knockout (PyV/c^'^^'Myo15- 
cre^^~), and Pjvk^^^ mice, at the age of 3 weeks, before and after 
controlled sound exposure or controlled electrical stimulation. 
The responses of the IHCs to sound-induced vibrations ampli- 
fied by OHCs trigger action potentials in the distal part of primary 
auditory neurons, at the origin of ABR wave I. In PyV/c^'^^'Myo15- 
cre‘^^“ mice, which lack pejvakin only in the hair cells, ABR 
wave I amplitude and latency at 105 dB SPL specifically probed 
IHC function, because IHC responses to such loud sounds 
are independent of OHC activity (Robles and Ruggero, 2001). 
The larger wave I latency (1.58 ms in PyV/c^‘^^'Myo15-cre'^^“ mice 
[n = 20] versus 1.32 ms in PJvk^^^ littermates [n = 30]; p < 
0.001) and lower wave I amplitude (37% of the amplitude in 
P\vk^'^ littermates; p < 0.001) suggested a dysfunction of the 
IHCs. Controlled sound exposure induced further decreases in 
ABR wave I amplitude in P\vk~'~ and PyV/c^'^^'Myo15-cre‘^^“ mice 
(48% and 55% of pre-exposure amplitude, respectively) with 
respect to P\vk^'^ mice (108%; p < 0.001 for both comparisons) 
(Figure 2A), demonstrating that P\vk~'~ IHCs are hypervulnerable 
to sound. As shown above, OHCs are also affected by the pejva- 
kin deficiency. Controlled sound exposure triggered a mean 
decrease in the DPOAE amplitude of 16.9 ± 7.2 dB in the 12 to 
20 kHz frequency interval in P\vk~'~ mice with persistent 
DPOAEs (n = 8; p < 0.0001 ), and an increase in DPOAE threshold, 
but it had no effect on the DPOAEs of P\vk^'^ mice (n = 9; p = 
0.51) (Figure 2B). OHCs lacking pejvakin are thus also hypervul- 
nerable to sound. 

We investigated the effect of the absence of pejvakin on the 
auditory pathway by comparing electrically evoked brainstem 
responses (EEBR) in P\vk~'~ and PyV/c^'^^‘Myo15-cre'^^“ mice 
(see the Supplemental Experimental Procedures). The ampli- 
tudes of the most distinctive EEBR waves, E II and E IV, 
did not differ between the two types of mice (for wave E IV: 
2.6 ± 1.8 |iV in P\vk~'~ mice [n = 18] and 2.2 ± 1.2 yN in 
PyV/c^'/^'Myo15-cre'^^“ mice [n = 11]; t test, p = 0.13). However, 
following controlled electrical exposure at 200 impulses/s for 
1 min, as opposed to electric-impulse stimulation with 16 im- 
pulses/s for 10 s for pre- and post-exposure EEBR tests, E II 
and E IV EEBR wave amplitudes got 41% and 47% smaller, 
respectively, for at least 3 min, in P\vk~'~ mice (n = 5; paired 
t test, p = 0.02 and p = 0.01 , respectively), but were unaffected 
in PyV/c^'^^'Myo15-cre^^“ mice (n = 10; p = 0.83) (Figures 2D and 
2G-2I). The E ll-E IV interwave interval was 0.41 ms longer in 
P\vk~'~ mice (n = 5) than in PyV/c^'^^'Myo15-cre'"^“ mice (n = 10; 
p = 0.003), and controlled electrical exposure extended this in- 
terval by a further 0.15 ms in P\vk~'~ mice only (paired t test, 
p = 0.001) (Figures 2H and 21). Likewise, the latency interval be- 
tween ABR wave I and wave IV (the counterpart of wave E IV), 
abnormal in one-third of the P\vk~'~ mice tested (with an ABR 
threshold < 95 dB SPL, n = 1 2) (Figures 2C and 2E), got abnormal 
in all of them after controlled sound exposure (0.1 6 ms further in- 
crease; paired t test, p < 0.001). By contrast, it remained normal 
in PyV/c^'^^'Myo15-cre'"^“ mice (n = 10 ears; p = 0.73) (Figures 2C 
and 2F). Thus, the absence of pejvakin affects the propagation 



of action potentials in the auditory pathway after both controlled 
electrical and sound exposure in the P\vk~'~ mice. 

To clarify whether these abnormalities were of neuronal or glial 
origin, we performed a rescue experiment in P\vk~'~ mice, using 
adeno-associated virus 8 (AAV8) vector-mediated transfer of the 
murine pejvakin cDNA (AAV8-Pjvk). AAV8 injected into the 
cochlea transduces the primary auditory neurons (cochlear gan- 
glion neurons) and neurons of the cochlear nucleus (Figure S2A), 
but not the hair cells. All P\vk~'~ mice (n = 7) injected on P3 and 
tested on P21 had normal ABR interwave l-IV latencies (Fig- 
ure 2J), and their EEBR wave-E IV amplitude was insensitive to 
controlled electrical stimulation (1.91 ± 0.97 yN before and 
1 .87 |iV ± 1 .07 after stimulation; paired t test, p = 0.59) (Figures 
2K and 2L). The absence of pejvakin thus renders auditory 
pathway neurons hypervulnerable to exposure to mild, short 
sound stimuli. 

Hypervulnerability to Sound in DFNB59 Patients 

We then investigated whether the hearing of DFNB59 patients 
was also hypervulnerable to sound exposure. We tested five 
patients carrying the p.T54l mutation (Delmaghani et al., 2006). 
Transient-evoked OAEs (TEOAEs) assessing OHC function 
over a broad range of frequencies were detected for all ears, 
despite the severe hearing impairment (hearing threshold in- 
creasing from 66 dB HL at 250 Hz to 84 dB at 8 kHz). Following 
minimal exposure to impulse stimuli (clicks at 99 dB nHL), ABR 
waves were clearly identified in response to 250 clicks. When 
exposure was prolonged to 1 ,000 clicks (the standard proce- 
dure), wave V, the equivalent of mouse ABR-wave IV, which 
was initially conspicuous, displayed a decrease in amplitude 
(to 39% ± 30% of its initial amplitude) and an increase in latency 
(of 0.30 ± 0.15 ms) (Figures 3A, 30, and 3D). In parallel, the 
l-V interwave interval increased by 0.30 ± 0.15 ms. Wave-V 
amplitude and latency recovered fully after 10 min of silence 
(Figure 3B). In control patients with sensorineural hearing im- 
pairment of cochlear origin matched for ABR thresholds, 
similar sound stimulation did not affect ABR wave-V amplitude 
(105% ± 14% of the initial amplitude after exposure; n = 13 pa- 
tients) or latency (-0.02 ± 0.07 ms change after exposure) (Fig- 
ures 3C and 3D). Exposure of the DFNB59 patients to 1 ,000 
clicks also affected TEOAEs (6.1 ± 5.2 dB nHL decrease in ampli- 
tude; paired t test, p = 0.02). Therefore, as in pejvakin-deficient 
mice, the cochlear and neuronal responses of DFNB59 patients 
were affected by exposure to low-energy sound. 

Redox Status Abnormalities and ROS-Induced Cell 
Damage in the Cochlea of Pjvk~'~ Mice 

We studied the P\vk~'~ cochlea by light microscopy on semithin 
sections and electron microscopy. On PI 5 and P21 , both OHCs 
and IHCs were normal in number and shape. Their hair bundles 
(the mechanoreceptive structures responding to sound), the rib- 
bon synapses of the IHCs, and their primary auditory neurons 
were unmodified (data not shown). On P30, we observed the 
loss of a few OHCs (16% ± 1 1 %, n = 5 mice), restricted to the 
basal region of the cochlea (tuned to high-frequency sounds). 
From P30 onward, OHCs, cochlear ganglion neurons, and then 
IHCs disappeared, and the sensory epithelium (organ of Corti) 
progressively degenerated (Figure S3A). 
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Figure 2. Effects on Auditory Function of Brief Exposure to Moderately Intense Stimuli in Pjvk'^^*, Pjvk~^~, and PyV/c^'^^'MyolS-cre"^^' Mice 

(A-C) ABR wave I amplitude (A), DPOAE amplitude (B), and ABR interwave l-IV latency (C) in P\vk^'^, P\vk~'~ , and Pjvk^'^^'[J\yo^5-cre'^^~ mice, before (dots) and 
after (crosses) controlled sound exposure, revealing the hypervulnerability to sound of both types of cochlear hair cells (IHCs and OHCs) and of the neural 
pathway. 

(D) EEBR wave E IV amplitude before and after controlled electrical exposure in P\vk~'~ and Pjvk^'^^'W\yo^5-cre'^^~ mice was abnormal and hypervulnerable only 
when pejvakin is absent from auditory neurons {Pjvk~^~ mice). 

(E and F) Examples of ABRs in P\vk~'~ and Pjvk^'^^'W\yo^5-cre'^^~ mice: the latency of wave I is affected by controlled sound exposure in both mutant mice, and 
wave IV displays an additional increase in latency only in P\vk~'~ mice. 

(G-l) Examples of EEBRs in P\vk^'^ (G), P\vk~'~ (H), and Pjvk^'^^'W\yo^5-cre'^^~ (I) mice; EEBRs are affected by controlled electrical exposure only in P\vk~'~ mice. 

(legend continued on next page) 
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We investigated possible changes in gene expression in the 
organ of Corti of P15 Pjvk~^~ mice, by microarrays (see the 
Supplemental Experimental Procedures). Eighteen genes had 
expression levels at least 1.5-fold higher or lower in Pjvk~^~ 
mice than in Pjvk'^^^ mice. Marked differences were observed 
for four genes involved in the redox balance— Cyp/A, Gpx2, c- 
Dct, and Mpv7 7— encoding cyclophilin A, glutathione peroxi- 
dase 2, c-dopachrome tautomerase, and Mpv17, respectively 
(Table SI). All of these genes were downregulated in Pjvk~^~ 
mice, a result confirmed by qRT-PCR (Figure S4A), and all 
encode antioxidant proteins, suggesting that Pjvk~^~ mice 
have impaired antioxidant defenses (Table SI). 

We thus assessed the level of oxidative stress in the cochlea of 
P21 Pjvk~^~ mice, by determining the ratio of reduced to oxidized 
glutathione (GSH:GSSG). The GSSG content was about three 
times larger than in Pjvk'^^^ mice, whereas the GSH content 
was 23% smaller, resulting in a GSH:GSSG ratio in Pjvk~^~ co- 
chleas reduced by a factor of 3.4 (Figure 4A). Pejvakin deficiency 
thus results in cochlear oxidative stress. 

We assessed lipid peroxidation by reactive oxygen species 
(ROS) in PJvk~^~ mice, by immunofluorescence-based detection 
of the by-product 4-hydroxy-2-nonenal (4-FINE). Strong immu- 
noreactivity was observed in P60 Pjvk~^~ hair cells and cochlear 
ganglion neurons (Figure S3B). Quantification of lipid peroxida- 
tion in microdissected organs of Corti from P30 Pjvk~^~ and 
Pjvk'^^^ mice, showed a moderate, but statistically significant, 
increase of the malondialdehyde content in the absence of 
pejvakin (2.15 ± 0.14 |iM in Pjvk~^~ versus 1.84 ± 0.11 |iM in 
Pjvk'^^^ mice; p = 0.04). Thus, pejvakin deficiency led to impaired 
antioxidant defenses in the cochlea, resulting in ROS-induced 
cell damage. 

We then studied electrophysiological features of IHCs and 
OHCs in the mature cochlea of P19-P21 Pjvk~^~ mice. In IHCs, 
the number of synaptic ribbons, Ca^'^ currents, and synaptic 
exocytosis were unaffected (Figure S5A). We investigated 
whether Pjvk~^~ mice display the main currents found in 
mature IHCs, specifically /k.t, which plays a major part in IHC 
repolarization and is involved in the high temporal precision of 
action potentials in postsynaptic nerve fibers, /k,s, and Ik,u (Oliver 
et al., 2006). The /kj current that flows through the large conduc- 
tance voltage- and Ca^'^-activated potassium (BK) channels, a 
well-known target of ROS (Tang et al., 2004), was detected in 
only 4 out of 1 1 Pjvk~^~ IHCs, and the mean number of spots im- 
munolabeled for the BK a-subunit per IHC was much lower in 
PJvk~^~ mice (5.0 ± 1 .4, n = 283 IHCs from seven mice) than in 
P\vk^'^ mice (13.9 ± 2.6, n = 204 IHCs from nine mice; t test, 
p < 0.001). By contrast, the /k,s and /k.d currents were not 
affected (Figures 4B and S5B). The electromotility of OHCs 
was moderately impaired in P\vk~'~ mice (Figure S5C). This con- 
trasted with the total loss of DPOAE in a large majority of Pjvk“^“ 
mice from PI 5 on, even at the highest possible stimulus level of 
75 dB SPL. It thus pinpointed the existence of an additional 
defect, likely a mechanoelectrical transduction defect, the 



main determinant of DPOAEs at high stimulus levels (Avan 
et al., 201 3). The decrease of the cochlear microphonic potential 
that reflects mechanoelectrical transduction currents through 
OHCs of the basal-most cochlear region, indeed corroborated 
the DPOAE measurements: this potential, recorded for a 5-kHz 
sound stimulus at 95 dB SPL, was always larger than 10 |iV in 
P\vk^'^ mice (n = 8), but fell between 5 and 3 |iV in the P21 P\vk~'~ 
mice with residual DPOAEs (n = 2), and below 1 yN, in the P\vk~'~ 
mice without persisting DPOAEs (n = 6). Taken together, oxida- 
tive stress in the P\vk~'~ cochlea impacts various electrophysio- 
logical properties of the hair cells, particularly mechanoelectrical 
transduction and current through BK channels. 

Mitochondrial defects are a common cause of ROS overpro- 
duction. However, we did not find evidence that mitochondria 
were damaged, as vulnerability of the mitochondrial membrane 
potential, Avjim, to the uncoupler carbonyl cyanide 4-(trifluorome- 
thoxy)phenylhydrazone (FCCP) in the organ of Corti and 
cochlear ganglion was similar in P17-P30 P\vk~'~ and P\vk^'^ 
mice, and analysis of P\vk~'~ hair cells by transmission electron 
microscopy (TEM) revealed no mitochondrial abnormalities (Fig- 
ure S5D; data not shown). 

Pejvakin Is a Peroxisome-Associated Protein 

By using P\vk~'~ cochlea as control, we found that neither the 
commercially available antibodies nor our initial polyclonal anti- 
body (Delmaghani et al., 2006) specifically recognized pejvakin 
(data not shown). Given the limited divergence of the pejvakin 
amino-acid sequence among vertebrates, we tried to elicit an 
immune response in P\vk~'~ mice (see the Experimental Proce- 
dures). The monoclonal antibody obtained, Pjvk-G21, labeled 
peroxisomes stained by peroxisome membrane protein 70 
(PMP70) antibodies in transfected HeLa cells expressing pejvakin 
(Figure S6A) and in the human HepG2 hepatoblastoma cell line, 
which is particularly rich in this organelle (Figure 5A). The speci- 
ficity of the Pjvk-G21 antibody was demonstrated by the immuno- 
labeling of peroxisomes in the hair cells of P\vk^'^, but not of 
P\vk~'~ and PyV/c^'^^'Myo15-cre^^“ mice (Figures 5B and SOB). 

Prediction programs failed to detect the PTS1 or PTS2 motifs 
in the pejvakin sequence (Mizuno et al., 2008), the targeting sig- 
nals for the importation of peroxisomal matrix proteins into the 
organelle (Smith and Aitchison, 2013), suggesting that pejvakin 
is a peroxisomal membrane or membrane-associated protein. 

Structural Abnormalities of Peroxisomes in the Hair 
Cells of PJvk~'~ Mice 

We investigated the distribution and morphology of peroxisomes 
by TEM. Peroxisomes were identified on the basis of catalase 
activity detection using 3,3'-diaminobenzidine as substrate. 
We focused on OHCs, the first to display a dysfunction in Pjvk~^~ 
mice. On P30, but not on PI 5, both the distribution and shape of 
peroxisomes differed between PJvk~^~ and P\vk^'^ mice (Fig- 
ure 5E). In P\vk^'^ OHCs, the peroxisomes were restricted to 
an area immediately below the cuticular plate. In P\vk~'~ mice. 



(J-L) Neuronal function rescue in P\v\c'~ mice by transduction with AAV8-Pjvk: effects on ABR interwave l-IV latency (J), on EEBR wave E IV amplitude and its 
hypervulnerability to electrical stimulation (K), and on EEBR interwave E ll-E IV latency (one example is shown in L, to be compared with H). Vertical arrows 
indicate the positions of waves I and IV on ABR traces and of waves E II and E IV on EEBR traces, ns, not significant; ***p < 0.001 . Error bars represent the SD. 
See also Figures SI and S2A. 
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Figure 3. Hypervulnerability to Sound in DFNB59 Patients 

(A) ABR waves I, III, and V (vertical arrows) in one ear of a patient carrying the 
PJVK p.T54l mutation, in response to 250, 500, and 1000 impulse stimuli 
(clicks) at 99 dB nHL. 

(B) Repeated ABRs after 10 min of silence, with an even larger vulnerability of 
waves I, III, and V. 

(C and D) Distributions of the amplitude (C) and latency (D) of ABR wave V in the 
tested sample of p.T54l patients (n = 8 ears), and in a control group of patients 



the peroxisomes located just below the cuticular plate were 
slightly larger than those in P\v\C'^ mice. Strikingly, irregular cata- 
lase-containing structures, some of which were juxtaposed, 
were present in the perinuclear region, in the immediate vicinity 
of the nuclear membrane of all P\vk~'~ OHCs, but not of 
P\v\C'^ OHCs (Figure 5E). The lack of pejvakin thus results in 
peroxisome abnormalities in OHCs after the onset of hearing. 

Pejvakin Is Involved in Oxidative Stress-Induced 
Peroxisome Proliferation 

In HepG2 cells, protrusions emerging from some peroxisomes, 
the first step of peroxisome biogenesis from pre-existing perox- 
isomes, were immunoreactive for pejvakin. String-of-beads 
structures corresponding to elongated and constricted peroxi- 
somes, preceding final fission (Smith and Aitchison, 2013), 
were also pejvakin-immunoreactive, suggesting a role for this 
protein in peroxisome proliferation (Figure S6C). Peroxisomes 
actively contribute to cellular redox balance, by producing and 
scavenging/degrading H 2 O 2 through a broad spectrum of oxi- 
dases and peroxidases (especially catalase), respectively 
(Schrader and Fahimi, 2006). Because P\vk~'~ mice displayed 
features of marked oxidative stress in the cochlea, we investi- 
gated the possible role of pejvakin in peroxisome proliferation 
in response to oxidative stress induced by H 2 O 2 (Lopez-Huertas 
et al., 2000). Embryonic fibroblasts derived from P\vk^'^ and 
P\yk~'~ mice were exposed to H 2 O 2 (see the Supplemental 
Experimental Procedures). In unexposed cells, the number of 
peroxisomes was similar between the two genotypes (t test, 
p = 0.82). After H 2 O 2 treatment, it increased by 46% in P\vk^'^ fi- 
broblasts (p = 0.004), but remained unchanged in P\vk~'~ fibro- 
blasts (p = 0.83), resulting in a statistically significant difference 
between the two genotypes (p < 0.001) (Figures 5C and S7A). 

We then asked whether mutations reported in DFNB59 pa- 
tients also affect peroxisome proliferation. We assessed the 
number of peroxisomes in transfected HeLa cells producing 
EGFP alone, EGFP and murine pejvakin, or EGFP and one of 
the mutated forms of murine pejvakin carrying the mutations 
responsible for DFNB59 (p.T54l, p.R183W, p.C343S, or 
p.V330Lfs*7). Cells producing the non-mutated pejvakin had 
larger numbers of peroxisomes than cells producing EGFP 
alone, whereas cells producing any of the mutated forms of pej- 
vakin (mutPjvk-IRES-EGFP) had smaller peroxisome numbers. 
In addition, many of these cells contained enlarged peroxi- 
somes, a feature typical of peroxisome proliferation disorders 
(Ebberink et al., 2012) (Figure 5D and S7B). Together, these re- 
sults strongly suggest that pejvakin is directly involved in the pro- 
duction of new peroxisomes from pre-existing peroxisomes. 

Upregulation of PJvk Cochlear Transcription and 
Peroxisome Proliferation in Response to Sound 

We then asked whether pejvakin is involved in the physiological 
response to sound. We first assessed the transcription of PJvk 



(n = 13) with cochlear hearing impairment and matched ABR threshoids, 
before and after exposure to ciicks #250 to #1 000. Boxes extend from the 25^^ 
to the 75^^ percentiie. Horizontai bars and verticai bars indicate median vaiues 
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and of CypA, Gpx2, c-Dct, and Mpv17, which were downregu- 
lated in P\vk~'~ mice, in microdissected organs of Corti from 
P21 wild-type mice, with or without prior sound stimulation (5- 
20 kHz, 1 05 dB SPL for 1 hr; see the Supplemental Experimental 
Procedures). Transcript levels were analyzed by qRT-PCR at 
various times (1,3,6, and 1 8 hr) after sound exposure (Figure 6A). 
PJvk transcript levels had increased by factors of 1.9 ± 0.1 and 

3.5 ± 0.7, mean ± SEM, after 1 and 6 hr, respectively. CypA, c- 
Dct, and Mpv17 were also upregulated after 6 hr (by factors of 

6.6 ± 1 .2, 4.3 ± 0.6, and 1 .5 ± 0.1 , respectively), as were c-Fos 
and Hsp70, used as a positive control, but not Gpx2. Thus, noise 
exposure leads to an upregulation of the transcription of PJvk and 
of genes downregulated in Pjvk~^~ mice, and this effect is depen- 
dent on acoustic energy level of the stimulation (Figure S4B). 

This result predicted that sound exposure would lead to 
peroxisome proliferation in the auditory system of wild-type 
mice. 6 hr after exposure (5-20 kHz, 105 dB SPL for 1 hr), 
the numbers of peroxisomes were unchanged (34.5 ± 0.8 and 
35.9 ± 1.0, mean ± SEM, per IHC from unexposed and sound- 
exposed mice, respectively, n = 75 cells from six mice; t test, 
p = 0.25). However, at 48 hr, they had markedly increased, by 
a factor of 2.3, in both IHCs and OHCs (84.7 ± 5.0 per IHC and 
16.5 ± 1.0 per OHC, n = 90 cells and n = 150 cells from six 
mice, respectively) compared to unexposed mice (36.8 ± 3.0 
per IHC and 7.3 ± 0.4 per OHC, n = 90 cells and n = 150 cells 
from six mice, respectively; t test, p < 0.0001 for both compari- 
sons). The number of peroxisomes had also increased, by 
35%, in the dendrites of primary auditory neurons (1.7 ± 0.1 
and 2.3 ± 0.2 peroxisomes per micrometer of neurite length, 
n = 40 neurites from five unexposed and five sound-exposed 
Pjvk^^^ mice, respectively; t test, p = 0.003) (Figure 6B). 

Therapeutic Approaches in PJvk~'~ Mice 

Based on these results, we tested whether the classical antiox- 
idant drug N-acetyl cysteine (NAC) (either alone or associated 
with a-lipoic acid and a-tocopherol; see the Supplemental 



Figure 4. Increased Oxidative Stress and 
ROS-Induced Cell Damage in the PJvk ' 
Cochlea 

(A) Reduced glutathione (GSFI) (left bar chart), 
oxidized-glutathione (GSSG) (middle bar chart) 
contents, and GSH:GSSG ratio (right bar chart) in 
P21 P\vk~'~ versus PJvk^'^ cochlea. Error bars 
represent the SEM of three independent experi- 
ments. See also Figure S3. 

(B) Marked decrease in the BK a-subunit im- 
munolabeling \nPjvk~^~ IHCs. Left: P20 Pjvk^^'^ and 
PJvk~'~ IHCs. Scale bar is 5 |am. Right: quantitative 
analysis of BK channel clusters. Error bars repre- 
sent the SD. See also Figure S5B. *p < 0.05, ***p < 
0 . 001 . 

See also Figures S3 and S5. 



Experimental Procedures) administered 
to P\vk~'~ pups could improve their hear- 
ing. The ABR thresholds of P21 NAC- 
treated P\vk~'~ pups (n = 21) were about 
10 dB lower than those of untreated 
24) for all frequencies tested (t test, p < 



PJvk~'~ pups (n 
0.001 for all comparisons) (Figure 7A). The amplitude of the 
ABR wave I elicited at 105 dB SPL (4.35 ± 1 .16 |iV, n = 21) was 
the same as that of PJvk^'^ mice (4.36 ± 1 .15 |iV, n = 18; t test, 
p = 0.97) and greater than that of untreated PJvk~'~ mice (1 .88 
± 1.07 |iV, n = 24; t test, p < 0.001) (Figure 7B). EEBRs were 
more resistant to the high-rate electrical stimulation in treated 
than in untreated mutant mice (Figure 7C). Conversely, NAC 
had no beneficial effect on OHCs (data not shown). The associ- 
ation of NAC with a-lipoic acid and a-tocopherol did not perform 
any better (data not shown). 

Full recovery of the neuronal phenotype was achieved by the 
intracochlear injection of AAV8-Pjvk (see above). As hair cells 
are not transduced by AAV8, we investigated whether AAV2/8, 
which transduces hair cells only (Figure S2B), could rescue 
the P\vk~'~ hair-cell phenotype. The auditory function of P\vk~'~ 
mice (n = 7, four pups per cage in every experiment) receiving 
intracochlear injections of AAV2/8-Pjvk-IRES-EGFP on P3 was 
assessed on P21, and the percentage of transduced IHCs 
and OHCs was evaluated in each injected and contralateral 
(not injected) cochlea, on the basis of EGFP fluorescence. 
Improvements in ABR thresholds of 20 to 30 dB SPL with respect 
to untreated mice were observed for frequencies between 10 
and 20 kHz (t test, p < 0.001 for all comparisons; Figure 7D). 
Upon injection of AAV2/8-EGFP, DPOAEs, ABR thresholds, 
and ABR wave I amplitude and latency were similar to those of 
untreated P\vk~'~ mice (data not shown). A partial reversion of 
the OHC dysfunction was obtained, with detectable DPOAEs 
in pejvakin cDNA-treated cochleas (threshold 54.0 ± 10.7 dB), 
but not in contralateral, untreated cochleas (Figure 7E). DPOAE 
thresholds were linearly correlated (r^ = 0.74, p < 0.001) with 
the number of EGFP-tagged OHCs (Figure 7F), suggesting 
that the normalization of DPOAE thresholds may be possible if 
all OHCs could be transduced. The latency of the ABR wave I 
in response to a 105 dB SPL stimulation decreased significantly 
(1 .38 ± 0.1 1 ms for the treated ears; n = 6, versus 1 .53 ± 0.1 0 ms 
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Figure 5. Pejvakin Is a Peroxisome-Associated Protein Involved in the Oxidative Stress-Induced Peroxisomal Proliferation 

(A and B) Immunolabeling of PMP70 and endogenous pejvakin in a HepG2 ceii (A) and in two P20 P\vk^'^ iHCs (B). See aiso Figure S6B. 

(C) Number of peroxisomes in P\vk^'^ and P\vk~'~ mouse embryonic fibrobiasts (MEFs) subjected to 0.5 mM H 2 O 2 versus untreated MEFs (n = 30 ceiis for each 
condition). See aiso Figure S7A. 

(D) Untransfected HeLa ceiis (NT) and transfected ceiis producing either EGFP aione or EGFP, together with the wiid-type pejvakin (Pjvk) or a mutated Pjvk 
(p.T54i, P.R183W, P.C343S, or p.V330L/s*7). Left panei: bar chart showing the numbers of peroxisome per ceii 48 hr after transfection. There were on average 

(legend continued on next page) 
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Figure 6. Effect of Exposure to Loud Sounds 
on the Cochlear Expression of Pjvk and the 
Number of Peroxisomes in Cochlear Hair 
Cells and Ganglion Neurons 

(A) Pjvk, c-Dct, CypA, Mpv17, and Gpx2 transcript 
levels assessed by qRT-PCR in P21 Pjvk^'^ organ 
of Corti 1, 3, 6, and 18 hr after sound exposure 
(5-20 kHz, 1 05 dB SPL for 1 hr). The levels of c-Fos 
and Hsp70 transcripts were used as positive con- 
trols. See also Figure S4B. 

(B) Peroxisome proliferation in P21 Pjvk^'^ hair 
cells and cochlear ganglion neurons after sound 
exposure (same conditions as in A). Peroxisomes 
were counted 48 hr after sound exposure. OHCs, 
IHCs, and neuronal processes stained for F-actin, 
myosin VI, and neurofilament protein NF200, 
respectively. In OHCs and IHCs, the peroxisomes 
are located below the CP and throughout the 
cytoplasm, respectively. For OHCs, both a lateral 
view and a transverse optical section at the level of 
CP (scheme on the right) are shown. The number of 
peroxisomes was increased in OHCs, IHCs, and 
dendrites after sound exposure. N, cell nucleus. 
***p < 0.001. Error bars represent the SEM. Scale 
bars are 5 i^m. 

See also Figure S4 and Table SI . 
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Before sound exposure, the numbers 
of peroxisomes in IHCs of P21 P\vk~'~ 
and AAV2/8-Pjvk-IRES-EGFP-injected 
P\v\c'~ mice did not differ from that of 
Pjvk^^^ mice (30.5 ± 1.9, 32.3 ± 2.1, and 
36.8 ± 3.0 peroxisomes, mean ± SEM 
per IHC, n = 60 cells from four Pjvk~^~ 
and four AAV2/8-Pjvk Pjvk~^~ mice, and 
n = 90 cells from six Pjvk'^'^ mice, respec- 
tively; t test, p = 0.1 1 and p = 0.30, respec- 
tively). By contrast, 48 hr after sound 
exposure (5-20 kHz) at 105 dB SPL for 
1 hr, the number of peroxisomes had 
decreased by 63% in Pjvk~^~ IHCs (30.5 ± 1 .9 and 1 1 .2 ± 1 .3 
peroxisomes per IHC, n = 75 cells from five unexposed and 
five sound-exposed Pjvk~^~ mice, respectively; t test, p < 
0.0001), and enlarged PMP70-labeled structures were pre- 
sent close to the nucleus (Figure 7J). In response to the 
same sound but of a lower intensity, i.e., 97 dB SPL for 1 hr, 
the number of peroxisomes was unchanged in Pjvk~^~ IHCs 
(30.5 ± 1.9 and 34.6 ± 2.3 peroxisomes per IHC, n = 60 cells 
PyV/c“^“ IHCs by AAV2/8-Pjvk-IRES-EGFP on their peroxisomes, from four unexposed and four sound-exposed Pjvk~^~ mice. 



for the contralateral, untreated ears; paired t test, p = 0.03) 
(Figure 7G), and its amplitude increased into the normal range 
(7.34 ± 0.80 |iV versus 2.93 ± 0.92 |iV; paired t test, p < 0.001) 
(Figure 7H), in relation to the number of EGFP-tagged IHCs 
(r^ = 0.89 for wave I amplitude, p < 0.001 ; Figure 71). No correc- 
tion of the interwave l-IV latency was observed, as expected 
(data not shown). 

Finally, we investigated the effect of the transduction of 



33% more peroxisomes in ceiis producing both EGFP and Pjvk (n = 200) than in ceiis producing EGFP aione (n = 150). Right panei: for every range of eniarged 
peroxisome size, x (0.6-0. 8 |am, 0.8-1 .0 ^irn, and >1 .0 |am), in two perpendicuiar directions, the proportion of ceiis containing at ieast one peroxisome. See aiso 
Figure S7B. 

(E) Abnormaiities in shape and distribution of peroxisomes in mature Pjvk~'~ OHCs detected by TEM (P30 Pjvk~'~ [middie and right] and Pjvk'^'^ [ieft] OHCs). 
Insets [middle panel]: enlarged views of individual peroxisomes. In Pjvk'^'^ OHCs, peroxisomes are grouped just under the cuticular plate (CP) (arrowheads), with 
none detected in the perinuclear region (n = 33 sections, upper bar chart). In Pjvk~'~ OHCs, some peroxisomes remain under the CP (arrowheads), but catalase- 
containing structures, misshapen peroxisomes (arrows), are detected in the perinuclear region (n = 24 sections, upper bar chart). Peroxisomes located under the 
CP are larger in Pjvk~^~ OHCs (n = 92 peroxisomes) than in Pjvk^'^ OHCs (n = 89 peroxisomes) (lower bar chart). N, cell nucleus. **p < 0.01 , ***p < 0.001 . Error bars 
represent the SEM. Scale bars are 5 i^m in (A) and (B) and 0.5 i^m in (E). 

See also Figures S6 and S7. 
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Figure 7. Therapeutic Approaches in P\vk ' Mice 

(A-C) Effect of N-acetyl cysteine (NAG) on auditory function in P\v\c'~ mice. (A) ABR threshoids in untreated versus NAC-treated P21 P\vlc'~ mice. (B) ABR wave I 
ampiitude for 1 0 kHz tone bursts in P\vk^'^ and untreated P\vk~'~ versus NAC-treated P\vk~'~ mice at P21 . (C) EEBR wave E iV ampiitude before (dots) and after 
(crosses) controiied eiectricai stimulation of the cochlear nerve at 200 impulses/s for 1 min in P\vk^'^, untreated Pjvk~'~ , and NAC-treated P\vk~'~ mice. 

(D-l) Effect of AAV2/8-Pjvk-IRES-EGFP transferred into the cochlear hair cells on the auditory function oiPjvk~^~ mice. See also Figure S2B. (D) ABR thresholds at 
10, 15, and 20 kHz in AAV2/8-Pjvk-IRES-EGFP-treated versus untreated P\vk~'~ mice. (E and H) DPOAE threshold (E) and ABR wave I amplitude (H) at 10 kHz in 

(legend continued on next page) 
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respectively; t test, p = 0.17), and no enlarged PMP70-stained 
structures were detected (data not shown). The absence of pej- 
vakin thus resulted in defective sound-induced peroxisomal pro- 
liferation (both at 105 dB SPL and 97 dB SPL) and, even, in 
peroxisome degeneration (at 105 dB SPL) in IHCs. In Pjvk~^~ 
mice injected with AAV2/8-Pjvk-IRES-EGFP on P3 and exposed 
to 105 dB SPL for 1 hr on P21, enlarged PMP70-labeled struc- 
tures were no longer detected in transduced IHCs, and the num- 
ber of peroxisomes increased by 35% (32.3 ± 2.1 and 43.7 ± 3.0 
peroxisomes per IHC, n = 60 cells from unexposed and exposed 
transduced Pjvk~^~ IHCs, respectively; t test, p = 0.002) (Fig- 
ure 7J). We conclude that pejvakin re-expression fully protects 
Pjvk~^~ IHCs from the degenerescence of peroxisomes and 
partially restores their impaired adaptive proliferation. 

DISCUSSION 

Noise-induced hearing loss (NIHL) is the second most common 
form of sensorineural hearing impairment after presbycusis in 
the United States (Dobie, 2008). Here, we describe a genetic 
form of NIHL, by showing that pejvakin deficiency in mice and 
DFNB59 patients leads to hypervulnerability to sound, due to a 
peroxisomal deficiency. To our knowledge, a peroxisomal cause 
of an isolated (non-syndromic) form of inherited deafness has not 
been reported yet. The peroxisome emerges as a key organelle in 
the redox homeostasis of the auditory system, for coping with the 
overproduction of ROS induced by high levels of acoustic energy. 

Acoustic energy is the main determinant of NIHL. The Lex,s hr 
(for level of exposure over an 8-hr workshift) index has been 
defined such that an Lex,s hr of X dB delivers the same energy 
as a stable sound of X dB played over a period of 8 hr. Chronic 
occupational exposures to less than 85 dB (or 80 dB, depending 
on the country) are deemed safe. In Pjvk~^~ mice, a single expo- 
sure to 63 dB Lex, 8 hr increased hearing thresholds by 30 dB, with 
full recovery occurring after about 2 weeks. By contrast, a ten 
times more energetic exposure to a Lex,s hr of 73 dB in wild- 
type mice of the same strain produces only an 18 dB shift in 
threshold, with a recovery time of 12 hr (Housley et al., 2013). 
This hypersensitivity of Pjvk~^~ mice to noise suggests that the 
Lex, 8 hr of about 83 dB for a cage of ten pups is sufficient to ac- 
count for permanent hearing loss in these Pjvk~^~ pups, while 
some of those housed in small numbers in quiet rooms can 
display near-normal hearing thresholds (see Figure 1C). Likewise, 
the auditory function of DFNB59 patients was transiently affected 
by a 57 dB Lex, 8 hr exposure, routinely used in ABR tests. 

NIHL involves the excessive production of ROS, overwhelming 
the antioxidant defense system and causing irreversible oxidative 
damage to DNA, proteins, and lipids within the cell (Henderson 
et al., 2006). Noise-induced oxidative stress results in the produc- 
tion of H 2 O 2 and other ROS as by-products, thought to derive 
from the intense solicitation of mitochondrial activity, and several 



mouse mutants with mitochondrial defects are prone to NIHL (Oh- 
lemiller et al., 1999; Brown et al., 2014). Our studies of pejvakin- 
deficient mouse mutants and rescue experiments targeting the 
hair cells and auditory neurons unambiguously show that IHCs, 
OHCs, primary auditory neurons, and neurons of the cochlear nu- 
cleus are hypervulnerable to sound in the absence of pejvakin, 
which is consistent with previous results showing that hair cells 
and neurons of the auditory system are targets of NIHL (Wang 
et al., 2002; Kujawa and Liberman, 2009; Imig and Durham, 
2005). However, our study goes one step further by implicating 
a possible common mechanism: peroxisomal failure, the impor- 
tance of which is demonstrated by the impairment of the redox 
homeostasis caused by pejvakin deficiency. It also reveals a ma- 
jor cause of the unusually high level of phenotypic variability 
observed in pejvakin-deficient mice and humans: the difference 
in sound exposure and the inability of the peroxisomes to cope 
with the resultant activity-dependent oxidative stress in the 
absence of pejvakin. Incidentally, this can account for the 
apparent paradox that mice carrying the R183W mutation in pej- 
vakin displayed a much more severe neural pathway defect than 
the PJvk~^~ mice (Delmaghani et al., 2006). Due to the preserva- 
tion of hair cell functions, the auditory neurons of R183W mutant 
mice should be strongly stimulated, whereas the early permanent 
damage to cochlear hair cells in Pjvk~^~ mice acts as a protective 
“muffler” of the neuronal pathway. 

In mammals, the number and metabolic functions of peroxi- 
somes differ between cell types. However, all cell types are 
able to adapt rapidly to modifications in physiological conditions 
by changing the number, shape, size, and molecular content of 
peroxisomes, resulting in considerable functional plasticity of 
these organelles (Schrader et al., 2012; Smith and Aitchison, 
201 3). Our experiments on Pjvk~^~ and Pjvk'^^^ mouse embryonic 
fibroblasts stressed with H 2 O 2 showed that pejvakin is critically 
involved in the oxidative stress-induced proliferation of peroxi- 
somes through growth and fission of pre-existing peroxisomes. 
The molecular machinery underlying this adaptive process is still 
poorly understood beyond the involvement of Pexlla (Li et al., 
2002). Of note, the absence of pejvakin only affects the prolifer- 
ation of peroxisomes from pre-existing peroxisomes, but not the 
constitutive biogenesis of this organelle. Accordingly, structural 
abnormalities of peroxisomes in Pjvk~^~ mice became apparent 
only after hearing onset, in the context of the oxidative stress 
produced by noise exposure. By contrast, the PEX gene defects 
causing Zellweger syndrome spectrum (ZSS) disorders (Water- 
ham and Ebberink, 2012) and rhizomelic chondrodysplasia 
punctata affect the constitutive biogenesis of peroxisomes. 
Hearing impairment in ZSS disorders involves a severe impair- 
ment of neuronal conduction and has been attributed to defects 
in the synthesis of two essential myelin sheath components— 
plasmalogens and docosahexaenoic acid— which is critically 
dependent on peroxisomes. Our results suggest that ZSS also 



treated versus untreated contralateral ears. (G) ABR wave I latency in treated versus untreated contralateral ears. (F) Correlation between DPOAE thresholds and 
the proportion of EGFP-tagged (i.e., transduced) OHCs. Six untreated ears have no recordable DPOAE (threshold arbitrarily set at 80 dB SPL; red diamond). (I) 
Correlation between ABR wave I amplitude at 10 kHz, 105 dB SPL, and the percentage of transduced IHCs (EGFP tagged). 

(J) Effect of AAV2/8-Pjvk-IRES-EGFP on the peroxisomes in Pjv\c'~ IHCs. Upper and lower panels show and quantify (bar charts) the peroxisomes in untreated 
mice 48 hr after sound exposure (5-20 kHz, 1 05 dB SPL for 1 hr) (peroxisome abnormalities are indicated by arrowheads). Error bars represent the SD in (A-l) and 
the SEM in (J). ns, not significant; *p < 0.05, **p < 0.01 , ***p < 0.001 . 
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includes a defective redox balance in the hair cells and neurons 
of the auditory system. 

In the context of noise exposure, the upregulation of PJvk tran- 
scription in the cochlea and the subsequent peroxisome prolifer- 
ation in the hair cells and auditory neurons of wild-type mice 
suggest that pejvakin-dependent peroxisome proliferation in 
the auditory system is part of the physiological response to 
high levels of acoustic energy that result in increased amounts 
of ROS. This and the marked oxidative stress detected in the 
P\vk~'~ cochlea imply that the proliferation of peroxisomes plays 
an antioxidant role, similar to that reported in other cell types 
(Santos et al., 2005; Diano et al., 2011). The rapid elevation of 
the hearing threshold in P\vk~'~ mice in response to low-energy 
sounds and the increase in interwave l-IV latency observed in 
DFNB59 patients within a few seconds are consistent with an ac- 
tivity-dependent H 2 O 2 production that, due to impaired cellular 
redox homeostasis, results in concentrations of H 2 O 2 high 
enough to impact on the activity of various target proteins 
including ion channels and transporters (Rice, 2011). The wors- 
ening of hearing sensitivity, 2 days later, in the mutant mice lack- 
ing pejvakin, exacerbated by putting back the mice in a noisy 
environment, fits the picture of the absence of sound-induced 
biogenesis of peroxisomes (with their degeneration occurring 
in a high acoustic energy environment). We thus conclude that 
the hypervulnerability of P'\vk~'~ mice and DFNB59 patients to 
sound does not result simply from an exacerbation, by sound, 
of a pre-existing redox-balance defect, but is the consequence 
of impaired adaptive proliferation of peroxisomes in the absence 
of pejvakin. Both defective peroxisome proliferation in IHCs of 
P\yk~'~ mice in response to sound exposure and its partial re- 
covery by pejvakin cDNA transfer support this conclusion. A 
full recovery of the adaptive peroxisome proliferation produced 
by sound exposure may require higher concentrations of pejva- 
kin or the sound-induced modulation of PJvk transcription (see 
Figure 6A), which was missing in our rescue experiments (pejva- 
kin cDNA expression being driven by a constitutive promoter). 

In patients with hearing impairment, the amplification of sound 
by hearing aids or direct electrical stimulation of the auditory 
nerve by a cochlear implant delivers a stimulus with an energy 
level similar to that shown here to worsen the hearing impairment 
of P\vk~'~ mice within 1 min of sound exposure. Therefore, in 
cases of peroxisomal deficiency, as in DFNB59, specific protec- 
tion against redox homeostasis failure is essential. Patients with 
such conditions should avoid noisy environments and a benefi- 
cial effect of hearing devices should require an antioxidant 
protection. N-acetyl cysteine was the only antioxidant drug 
tested here to display some, albeit limited, efficacy. By contrast, 
AAV-mediated gene therapy could potentially provide full pro- 
tection. Finally, deciphering the sound-stress-induced protec- 
tive signaling pathway involving pejvakin might lead to the 
discovery of therapeutic agents for NIHL. 

EXPERIMENTAL PROCEDURES 
Audiological Studies in Mice 

Auditory tests were performed in an anechoic room, on anesthetized animais 
whose core temperatures were maintained at 37°C (see the Suppiementai 
Experimentai Procedures). 



Audioiogicai Tests in Patients 

informed consent was obtained from aii the subjects inciuded in the study. 
Pure-tone audiometry was performed with air- and bone-transmitted tones. 
Hearing impairment was assessed objectiveiy, by measuring ABRs and 
transient-evoked otoacoustic emissions (TEOAEs). The noniinear TEOAE 
recording procedure was used (derived from the iL088 system), making it 
possibie to extract TEOAEs from iinear reflection artifacts from the middle 
ear, and to evaluate background noise. TEOAE responses were analyzed in 
1 -kHz-wide bands centered on 1 , 2, 3 and 4 kHz. 

Generation of an Anti-pejvakin Monocionai Antibody 

The 3' end of the coding sequence of the Pjvk cDNA (NCBI:NM_001 08071 1 .2) 
was inserted into a pGST-parallel-2 vector (derived from pGEX-4T-1; Amer- 
sham). The resultant construct, encoding the C-terminal region of pejvakin 
(residues 290-352; RefSeq:NP_001 0741 80.1) fused to an N-terminal gluta- 
thione S-transferase tag, was introduced into Escherichia coii BL21-Gold 
(DE3)-competent cells (Stratagene). The pejvakin protein fragment was puri- 
fied on a glutathione-Sepharose 4B column, then subjected to size-exclusion 
chromatography and used as the antigen for immunization. Antibodies were 
produced by immunizing PJvk~'~ mice. An immunoglobulin G monoclonal anti- 
body (Kd of 6 X 10“® M), Pjvk-G21, was selected by ELISA on immunogen- 
coated plates. 

Statistical Analyses 

Quantitative data are presented as mean + SD, unless otherwise mentioned. 
Statistical analyses were performed using GraphPad. Data were analyzed by 
paired or unpaired Student’s t tests and, for multiple comparisons, either by 
one-way or two-way ANOVA or by t tests with the Bonferroni correction. Sta- 
tistical significance of the differences observed between groups is defined as 
p < 0.05. 
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SUMMARY 

Steroid hormones are a large family of cholesterol 
derivatives regulating development and physiology 
in both the animal and plant kingdoms, but little 
is known concerning mechanisms of their secretion 
from steroidogenic tissues. Here, we present evi- 
dence that in Drosophila, endocrine release of the 
steroid hormone ecdysone is mediated through a 
regulated vesicular trafficking mechanism. Inhibition 
of calcium signaling in the steroidogenic prothoracic 
gland results in the accumulation of unreleased 
ecdysone, and the knockdown of calcium-mediated 
vesicle exocytosis components in the gland caused 
developmental defects due to deficiency of ecdy- 
sone. Accumulation of synaptotagmin-labeled vesi- 
cles in the gland is observed when calcium signaling 
is disrupted, and these vesicles contain an ABC 
transporter that functions as an ecdysone pump to 
fill vesicles. We propose that trafficking of steroid 
hormones out of endocrine cells is not always 
through a simple diffusion mechanism as presently 
thought, but instead can involve a regulated vesicle- 
mediated release process. 

INTRODUCTION 

Steroid hormones are an important class of bioactive molecules 
in both animal and plant kingdoms that regulate a wide variety of 
physiological processes including immune response, salt and 
water balance, glucose metabolism, and sexual maturation dur- 
ing juvenile development (Sapolsky et al., 2000; Sisk and Foster, 
2004). In insect larvae, the primary precursor steroid hormone 
ecdysone (E) is produced in the prothoracic gland (PG). After 
its release into the circulatory system, E is taken up in peripheral 
tissues such as the gut and fat body where it is converted to 
20-hydroxyecdysone (20E). This is the active derivative that reg- 
ulates larval molt timing and the onset of metamorphosis leading 
to the formation of sexually mature adults (Yamanaka et al., 
2013a). 

The biosynthetic pathways of steroid hormone production 
have been extensively studied in diverse animal species, and 
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many of the key enzymes have been identified and well charac- 
terized (Ghayee and Auchus, 2007; Huang et al., 2008; Miller, 
2013). In insects, E biosynthesis is stimulated by extracellular 
signaling molecules such as the prothoracicotropic hormone 
(PTTH), which binds to its receptor Torso to induce the expres- 
sion of genes encoding steroidogenic enzymes (Rewitz et al., 
2009; Yamanaka et al., 2013a). 

In contrast to the extensive literature describing studies on 
steroidogenic processes, very little is known about the mecha- 
nisms that regulate release of steroid hormones from endo- 
crine tissues. Indeed, the textbook view is that lipophilic steroid 
hormones simply enter and exit cells by diffusion across lipid 
bilayers (Raven and Johnson, 2002; Sherwood, 2011; White 
and Porterfield, 2012). However, this prevailing assumption has 
not been extensively tested in vivo, and the limited studies 
described so far primarily used in vitro or in silico approaches 
(Oren et al., 2004; Watanabe et al., 1991). 

Given the scarcity of knowledge concerning this fundamental 
aspect of endocrinology, we used molecular genetic tools 
to investigate the mechanism of E release from the PG in 
Drosophila melanogaster. We found that blocking calcium 
signaling through RNAi-mediated knockdown of the inositol 
1 ,4,5,-tris-phosphate receptor (I P3R) in the PG leads to a buildup 
of E and a decrease of 20E in source and target tissues, respec- 
tively. This results in severe delay or larval developmental 
arrest that can be rescued by feeding larvae E. Identical develop- 
mental defects were observed in larvae in which cellular com- 
ponents normally involved in calcium-mediated vesicle exocy- 
tosis, such as Rab3, UNC-13, or synaptotagmin 1 (Sytl), were 
depleted in the PG. Moreover, GCaMP imagining of the PG 
just prior to metamorphosis revealed spontaneous calcium 
signaling that was attenuated by RNAi-mediated knockdown of 
the upstream signaling component that couples G protein- 
coupled receptors (GPCRs) to calcium release. Furthermore, 
the accumulation of Sytl -positive vesicles was observed when 
calcium signaling was blocked in the PG, suggesting that cal- 
cium-mediated vesicle exocytosis is required for E release. 
Consistent with this notion, we identified an ABC transporter 
found in these Sytl -positive vesicles and show that it transports 
E across a lipid bilayer in vitro. Taken together, these results 
support a new hypothesis that transport of steroid hormones 
across lipid bilayers can involve a regulated vesicle-release pro- 
cess instead of, or in addition to, passive or facilitated diffusion 
mechanisms. 
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Figure 1 . IP3R Is Required for E Secretion from the PG 

(A) /P3f?-knockdown in the PG causes polyphasic growth arrest. Percentages of developmentally arrested larvae for each genotype are shown. Percentages of 
larvae arrested at first or second instar (L1/L2) are indicated in light gray, whereas those arrested at third instar (L3) are shown in dark gray. E feeding is indicated 
by + E. Numbers of animals tested are shown on top of each bar. 

(B) /P3P-knockdown in the PG causes overgrowth at L3. Representative image of a wandering control larva (control, phm22>dicer2) and an /P3P-knockdown 
larva arrested as L3 (RNAi, phm22 > IP3R RNAi, dicer2). Scale bar, 1mm. 

(C) /P3P-knockdown in the PG causes developmental delay. Developmental time to pupariation of non-arrested larvae is shown. AEL, after egg laying. Data are 
represented as mean ± SEM of three to seven independent experiments. 

(D) /P3P-knockdown in the PG leads to the formation of overgrown pupae. Representative image of a control pupa (control, phm22>dicer2), an /P3P-knockdown 
pupa (RNAi, phm22 > IP3R RNAi, dicer2) and an /P3P-knockdown pupa rescued by E feeding (RNAi + E, phm22 > IP3R RNAi, dicer2 with E feeding) is shown. 
Scale bar, 1 mm. 

(E) Separation of E and 20E by a methanol (MeOH) gradient on reverse-phase HPLC. UV absorbance profiles at 248 nm of standard E (dashed black line) and 20E 
(solid black line) are shown on an arbitrary scale. The relative quantity of ecdysteroids in CNS-RG complexes of different groups of animals are shown in solid 
colored lines as E equivalent (pg) in each fraction. Fractions corresponding to 36%-38% MeOH and 42.5%-44.5% MeOH were pooled and used for 20E and E 
quantification, respectively. 

(F) Quantity of ecdysteroids in CNS-RG complexes at different developmental times and larval genotypes. The E titer is indicated in light gray, whereas that of 20E 
is shown in dark gray. Each bar represents mean ± SEM of three independent sample preparations. *p < 0.05; **p < 0.01 from Student’s t test, fp < 0.05 compared 
to the amount of E in late feeding control larvae (ANOVA with Tukey’s post hoc test). 

(G) Quantity of ecdysteroids in the hemolymph at different developmental times and larval genotypes. The E titer is indicated in light gray, whereas that of 20E is 
shown in dark gray. Each bar represents mean ± SEM of three independent sample preparations. 



RESULTS 

IP3R-Mediated Calcium Signaling Is Required for E 
Secretion from the PG 

Studies using isolated PGs of lepidopteran insect species have 
long suggested a key role for calcium in stimulating E production 
and release in response to PTTH (Huang et al., 2008). To test this 
in Drosophila, we conducted PG-specific knockdown of IP3R, 
which encodes an intracellular calcium-release channel. IP3R 
is highly expressed in the PG and mutants have growth defects 
due to low systemic levels of E (Venkatesh and Hasan, 1 997), but 
direct links between the IP3R function and E production or 
release have yet to be tested. When IP3R RNAi was induced us- 
ing the PG driver phm22-Gal4, we observed polyphasic growth 
arrest throughout larval development (Figure 1A). Those larvae 
that arrested in the third instar stage showed an overgrowth 



phenotype due to an extended larval feeding period, which is 
commonly observed for E-deficient animals (Figure 1 B) (Caceres 
et al., 201 1 ; Ou et al., 201 1 ; Rewitz et al., 2009; Talamillo et al., 
2008). Feeding E to these larvae rescued the arrest phenotype, 
further suggesting that the systemic E level is low in these ani- 
mals. Even those individuals that eventually initiated metamor- 
phosis (~60%) only did so after a significant delay (Figure 1C) 
and also exhibited an overgrowth phenotype (Figure 1 D), both 
of which were fully rescued by E feeding. Importantly, however, 
neither the larval arrest nor pupariation delay phenotype was fully 
rescued by overexpressing Ras^^^, the active form of Ras that is 
able to completely rescue a PTTH signaling deficiency (Figures 
lAand 1C) (Rewitz et al., 2009). Such partial rescue strongly sug- 
gests that, unlike previous assumptions based on moth studies 
(Huang et al., 2008), the intracellular calcium release mediated 
by IP3R is not solely functioning upstream of the MARK signaling 
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cascade and E synthesis, but is likely regulating other processes 
in the Drosophila PG. 

One hypothesis we considered was that calcium signaling 
might play a role in E release from the PG as opposed to, or 
in addition to, E synthesis. As a first test of this hypothesis, 
we sought to quantify E and 20E in the PG of phm22 > IP3R 
RNAi, dicer2 larvae. The PG is part of a composite endocrine or- 
gan called the ring gland (RG), which is attached to the CNS. 
The small size of the RG presents a substantial dissection chal- 
lenge. Therefore, we dissected the larger CNS-RG complex, ex- 
tracted steroids with methanol, and separated E from 20E by 
high-performance liquid chromatography (HPLC). The E and 
20E containing fractions were quantified using an ELISA for 
ecdysteroids (Figure 1E). This analysis revealed that the titers 
of both E and 20E increase in the CNS-RG complex of control 
larvae during the wandering stage as they prepare for metamor- 
phosis (Figure 1 F). This observation is consistent with the notion 
that the E biosynthetic activity of the PG is stimulated by PTTH 
during this period and the increased levels of 20E in the hemo- 
lymph help initiate wandering behavior when taken up by the 
brain (Yamanaka et al., 2013b). In control larvae, the titer of 
20E was substantially higher than that of E in the CNS-RG com- 
plex during the wandering stage, indicating that the release of E 
into the hemolymph and its conversion to 20E happens quickly 
under normal conditions. Indeed, the 20E titer in the hemolymph 
increased rapidly in wandering control larvae, while the E titer 
remained constant or decreased during this stage (Figure 1G). 
In the case of phm22 > IP3R RNAi, dicer2 larvae, however, 
the ecdysteroid composition in the CNS-RG complex was 
markedly different; the E titer was elevated above that seen at 
any stage in control larvae, and the titer of 20E was less instead 
of more than E (Figure 1F). For /P3F?-knockdown animals that 
are developmentally delayed, we used wandering as a behav- 
ioral trait and used 150 hr after egg laying (AEL) wandering 
larvae for hormone titer measurement, in order to ensure that 
we are examining hormone titers at the same developmental 
stage. Consistent with this notion, the hemolymph levels of ec- 
dysteroids in these /P3F?-knockdown animals were comparable 
to those of early wandering control animals (Figure 1G). From 
these results, we infer that: (1) E is still produced in the PG 
even when the intracellular calcium release is downregulated 
by IP3R RNAi, and (2) the release of E into the hemolymph is 
inhibited upon IP3R knockdown, resulting in the accumulation 
of E in the PG. 

Components that Regulate Secretory Vesicle 
Exocytosis Are Required for Proper PG Function 

The above results led us to hypothesize that E secretion from the 
PG is not a fully passive or facilitated diffusion process, but 
instead may employ a regulatory mechanism that is under the 
control of calcium signaling. This is reminiscent of secretory 
vesicle exocytosis in neurons and endocrine cells, where the 
fusion of vesicles containing neurotransmitters, or various types 
of hormones, with the plasma membrane is tightly regulated by 
multiple components that sense intracellular calcium concentra- 
tions (Rizo and Rosenmund, 2008; Sudhof, 2004). Therefore, we 
conducted RNAi knockdown screening of components known to 
be either upstream regulators of intracellular calcium release or 



downstream effectors of calcium-regulated secretory vesicle 
exocytosis (Figure 2A; Table 1). 

There are two distinct genes encoding intracellular calcium- 
release channels in Drosophila: IP3R and Ryanodine receptor 
(RyR) (Sorrentino et al., 2000). The knockdown of IP3R in the 
PG showed the larval arrest phenotype as described above, 
whereas that of RyR did not cause any discernible defect 
(Table 1). IP3R is activated by inositol 1 ,4,5,-tris-phosphate pro- 
duced by phospholipase C (PLC) enzymes, which are encoded 
by three genes in Drosophila (Shortridge and McKay, 1995). 
PG-specific RNAi knockdown of one of them, Plc21C, caused 
the larval arrest phenotype (Table 1). Plc21C encodes a member 
of PLCp class of enzymes, which are typically activated by the 
Gaq subunit of heterotrimeric G proteins coupled with GPCRs. 
Although RNAi-mediated knockdown of Gaq did not cause 
developmental arrest, we identified CG30054 and CG17760 as 
two adjacent genes that are highly homologous to Gaq (Figures 
SI A and SI B). PG-specific knockdown of CG30054, but not that 
of CG17760, caused a developmental defect similar to knock- 
down of P/c2 7 C and IP3R (Table 1). Taken together, these results 
indicate that the calcium mobilization from intracellular stores 
through IP3R is regulated by GPCR(s) in the PG (Figure 2A), 
further suggesting that there is an as-yet-unknown signaling 
pathway acting in parallel with the PTTH/Torso/MAPK pathway 
to facilitate E release as opposed to its production. 

The ternary SNARE complex that consists of synaptobrevin/ 
VAMP, SNAP-25, and syntaxin plays a pivotal role in various 
types of vesicle exocytosis in eukaryotes ranging from yeast to 
human (Li and Chin, 2003; Richmond and Broadie, 2002; Rizo 
and Rosenmund, 2008; Sudhof, 2004). Indeed, the PG-specific 
knockdown of Syb, one of the two synaptobrevins in Drosophila, 
causes the larval arrest phenotype, supporting the hypothesis 
that calcium-regulated vesicle exocytosis is required for E secre- 
tion (Table 1). The SNARE complex, however, is involved in all 
intracellular membrane fusion events, and each organism has 
multiple SNARE proteins that are localized to distinct membrane 
compartments to specify intracellular compartmental identity (Li 
and Chin, 2003). These diverse SNARE complex functions make 
it difficult to interpret the above result, since E synthesis in the 
PG involves the trafficking of synthetic intermediates between or- 
ganelles, and the disruption of this process is expected to cause 
similar developmental defects. Likewise, the necessity of the 
exocyst complex components in the PG (Table 1) (Andrews 
et al., 2002) is difficult to attribute solely to its potential role in 
E secretion, since this complex is required for multiple membrane 
trafficking events, including the transport of vesicles carrying 
transmembrane receptors from the frans-Golgi network to the 
plasma membrane (Langevin et al., 2005; Murthy et al., 2003). 
That knockdown of exocyst complex members does disrupt traf- 
ficking of transmembrane proteins to the plasma membrane is 
clearly shown by the depletion of mCD8-GFP in the plasma mem- 
brane of PG cells of larvae expressing phm22 > Sec 70 RNAi 
(Figure S2). Such a defect will likely disrupt several signaling path- 
ways including PTTH and insulin signaling, both of which require 
the transport of their receptors to the PG plasma membrane for 
high level E production (Yamanaka et al., 2013a). 

In light of these difficulties in interpreting the phenotypes pro- 
duced by knockdown of general secretory machinery subunits. 
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Figure 2. Regulatory Components of Secretory Vesicle Exocytosis Are Required in the PG for Normal Developmental Progression 

(A) Schematic illustration of components involved in calcium-regulated vesicle exocytosis in the PG cells. An unknown GPCR is shown in pink, SNARE complex 
proteins are shown as green ovals, and E is depicted as purple filled circles. 

(B) Knockdown of the regulatory components for secretory vesicle exocytosis in the PG causes polyphasic growth arrest. Percentages of developmentally 
arrested larvae for each genotype are shown. Percentages of larvae arrested at first or second instar (L1/L2) are indicated in light gray, whereas those arrested at 
third instar (L3) are shown in dark gray. E feeding is indicated by + E. Numbers of animals tested are shown on top of each bar. Insets are representative images of 
wandering control larvae (control, phm22 > or phm22>dicer2) and RNAi larvae arrested as L3 (RNAi). Scale bars, 1 mm. 

(legend continued on next page) 
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we focused our analysis on components that are more specif- 
ically involved in calcium-regulated exocytosis. UNC-13 is a 
highly conserved, plasma membrane-associated presynaptic 
protein with calcium-binding domains. It interacts with the 
SNARE protein syntaxin and primes synaptic vesicles for fusion 
and is essential for calcium-regulated synaptic vesicle exocy- 
tosis (Aravamudan et al., 1999; Richmond et al., 2001). There 
are three Drosophila genes that encode UNC-1 3 family proteins, 
and knockdown of only CG34349 produced the larval arrest 
phenotype (Figure 2B; Table 1). Likewise, one of the secretory 
Rab GTPases, Rab3, and its interacting molecule RIM, both of 
which are involved in secretory vesicle trafficking and priming 
(Fukuda, 2008; Graf et al., 2012; Muller et al., 2012), are also 
essential for proper function of PG cells (Figure 2; Table 1). Syn- 
aptotagmins, which form another class of calcium sensor pro- 
teins critical for vesicle exocytosis (Chapman, 2008; de Wit 
et al., 2009), were also tested for their necessity in the PG, and 
only Sytl was found to be required for PG function (Figure 2; 
Table 1). The knockdown phenotypes of these regulatory exocy- 
tosis components were strikingly similar to that of IP3R] they all 
showed polyphasic larval developmental arrest and pupariation 
delay accompanied by overgrowth, both of which were rescued 
by E feeding (Figure 2). Moreover, unlike SedO RNAi, the knock- 
down of these components in the PG did not disrupt the plasma 
membrane localization of the mCD8-GFP reporter, suggesting 
that constitutive membrane traffic is not altered (Figure S2). 

To further validate that the calcium signaling pathway is active 
in the PG cells at the time of metamorphosis initiation, we moni- 
tored calcium dynamics in these cells using the genetically 
encoded calcium indicator, GCaMPS (Akerboom et al., 2012). 
As illustrated in Figures 2D and 2E, two types of spontaneous ac- 
tivities were observed in the PG cells of wandering larvae. One 
consisted of major concentration changes throughout the entire 
volume of a PG cell, which we refer to as macro spikes (Figures 
2D and 2E; Movie SI). The number of active cells within a gland 
varied significantly, and the dynamics of the calcium concentra- 
tion observed also varied in amplitude, duration, and frequency 
on a cell-by-cell basis (Figure 2E). A second activity, which we 
refer to as micro spikes (Figure 2E), appeared to occur in a 
limited area on the cell surface and exhibited faster kinetics. 
When the PG-specific knockdown of Plc21C was performed 
with two distinct RNAi constructs, a significant decrease in the 
number of animals exhibiting calcium dynamics of either class 
was observed (Figure 2F), while RNAi of a random control gene 
(gbb) had no effect. These results support the notion that 
GPCR-mediated calcium signaling is occurring in the PG cells 
prior to metamorphosis. 



In order to genetically validate the coupling of GPCR-regu- 
lated calcium release and vesicle exocytosis, an activated form 
of Gaq (Gaq[Q203L]) was expressed in the PG (Figure SIC). 
Interestingly, constitutive activation of Gaq pathway in the PG 
led to early larval lethality, suggesting that timely activation of 
GPCR signaling pathway is critical for proper larval develop- 
ment. Importantly, this early larval lethality was rescued by 
co-expressing RNAi constructs of the anticipated downstream 
components (IP3R, CG34349, Rab3, and Syt1; Figure SIC) 
and more larvae developed into L3 or beyond, consistent 
with the knockdown phenotypes of those downstream genes 
(Figures 1 and 2). Taken together, these results suggest that 
the GPCR-regulated calcium release through IP3R is indeed 
coupled with vesicle exocytosis in the PG cells. 

Sytl -Positive Vesicles Accumulate in the PG upon 
Calcium Signaling Knockdown 

In order to visualize putative secretory vesicles whose exocy- 
tosis is regulated by calcium signaling we expressed in the PG 
eGFP-tagged Sytl (Syt-GFP), a widely used secretory vesicle 
marker in both neuronal and non-neuronal cells (Sugita et al., 
2001; Zhang et al., 2002). In wild-type larvae, Syt-GFP labeled 
both the plasma membrane and a small number of vesicles in 
the PG (Figures 3A and 3C). It is known that Syt-GFP often labels 
the plasma membrane, depending on its expression level (Kanno 
and Fukuda, 2008). Interestingly, knocking down IP3R in the 
PG resulted in prominent accumulation of the Syt-GFP vesicles 
in the cytoplasm, especially in the areas adjacent to the plasma 
membrane (Figures 3B, 3D, and 3E). IP3R knockdown in the PG 
did not alter the gross morphology of the PG (Figures 3B and S2), 
although the size of each cell might be slightly increased, which 
is potentially coupled with the accumulation of vesicles in the 
cytoplasm. This accumulation of Syt-GFP vesicles at the plasma 
membrane was typically observed in the PGs of IP3R RNAi 
larvae during the extended third instar stage (i.e., after 140 hr 
AEL), at the time when E accumulation was observed in the 
CNS-RG complexes (Figure 1 F). This observation is consistent 
with our hypothesis that E is loaded into Sytl -positive secretory 
vesicles in the PG and is released into the hemolymph via exocy- 
tosis triggered by calcium signaling. 

Atet Is an ABC Transporter Present in Sytl -Positive 
Vesicles in the PG 

If E indeed requires vesicle-mediated machinery to be released 
from the PG, there should be transporters on the vesicle surface 
that load E into the vesicles. There is an ATP-binding cassette 
(ABC) transporter, E23, which has been proposed to function 



(C) Knockdown of the regulatory components for secretory vesicle exocytosis in the PG causes developmental delay. Developmental time to pupation among 
non-arrested larvae is shown. Data are represented as mean ± SEM of three to seven independent experiments. Insets are images of control pupae (control, 
phm22 > or phm22>dicer2), RNAi pupae (RNAi), and RNAi pupae rescued by E feeding (RNAi + E). Scale bars, 1 mm. 

(D) GCaMP5 calcium imaging of the PG cells from wandering third \r\star phm22>GCaMP5 larvae. Cumulative maximum intensity projection of a 5 min time-lapse 
imaging is shown. Colored circles indicate the regions of interest (ROIs) where macro calcium spikes were observed. Scale bar, 25 lam. 

(E) Plot of mean signal intensity in the cells indicated in (D). The color of each plot corresponds to that of the ROIs in (D). Inset is an example plot of a micro spike 
from a different sample. 

(F) Quantification of the PGs that presented either macro (dark gray) or micro (light gray) calcium spikes in the animals of different genotypes (control, 
phm22>GCaMP5\ PV\K\,phm22 > GCaMP5, RNAi). Numbers of animals observed are shown on top of each genotype. **p < 0.01 ; ***p < 0.001 from Fisher’s exact 
test compared to control. 

See also Figure S2 and Movie SI . 
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Table 1. RNAi Screening of Genes Involved in Membrane Traffic, Intraceliular Calcium Signaling, or Regulated Vesicle Exocytosis 





Classification 


Gene Name 


CG Number 


VDRC Transformant ID 


Phenotype 


Membrane traffic 


synaptobrevin 


Syb 


CG12210 


39770*, 102922* 


developmental delay/arrest 






n-Syb 


CGI 7248 


44011, 49201, 104531 


no 




exocyst 


Sec3 


CG3885 


35806, 108085 


no 






Sec5 


CG8843 


28873* 


developmental delay/arrest 






Seed 


CG5341 


22077*, 105836* 


developmental delay/arrest 






Seed 


CG2095 


45032*, 105653* 


developmental delay/arrest 






SecIO 


CG6159 


N/A*^ 


developmental delay/arrest 






Seel 5 


CG7034 


35161*, 105126* 


developmental delay/arrest 






exoJO 


CG7127 


27867, 103717 


no 






exo84 


CG6095 


30111*, 108650* 


developmental delay/arrest 


Intracellular Ca^^ release 


Gaq 


CG30054 


CG30054 


4643*^ 4644*, 102887 


developmental delay/arrest 






Gaq 


CGI 7759 


50729, 105300 


no 






CGI 7760 


CGI 7760 


42255, 52308, 107613 


no 




phospholipase C 


Pic21C 


CG4574 


26557*, 108395*^= 


developmental delay/arrest 






norpA 


CG3620 


105676 


no 






PLCy 


CG4200 


7173, 108593 


no 




ER Ca^"^ channel 


IP3R 


CGI 063 


6484*^ 106982* 


developmental delay/arrest 






RyR 


CGI 0844 


109631 


no 


Ca^^-regulated exocytosis 


UNC-13 


CG34349 


CG34349 


31 571 *^ 31573*, 107855 


developmental delay/arrest 






UNC-13 


CG2999 


33606, 33609, 101383 


no 






UNC-13-4A 


CG32381 


41835, 109304 


no 




secretory Rab 


Rab3 


CG7576 


100787*^= 


developmental delay/arrest 






Rab26 


CG34410 


43730, 101330 


no 






Rab27 


CGI 4791 


31887^, 35774^ 


no 




RIM 


RIM 


CG33547 


48072*^^, 39384 


developmental delay/arrest 




synaptotagmin 


Sytl 


CG3139 


100608*^ 8874 


developmental delay/arrest 






Syt4 


CGI 0047 


33317 


no 






Syt7 


CG2381 


24988 


no 






Syt12 


CGI 061 7 


47504, 47506, 110655 


no 






Syt14 


CG9778 


11037 


no 






Stya 


CG5559 


9303, 100957 


no 






Syt^ 


CG42333 


30013, 103345 


no 



UAS-RNAi lines from Vienna Drosophila RNAi Center (VDRC) were crossed to phm22>dicer2 to induce tissue-specific knockdown in the PG. Multiple 
RNAi lines were tested for each gene whenever available to minimize the false-positive results from off-target effects. *Lines that exhibited the devel- 
opmental delay/arrest phenotype. 

See also Figure S1 . 

®Kind gift from K. Broadie. 

'^Lines from Transgenic RNAi Project at Harvard Medical School (obtained from Bloomington Drosophila Stock Center). 

‘^Phenotypic rescue with E feeding was confirmed. 



as a 20E exporter to modulate the effective intracellular concen- 
tration of 20E in peripheral tissues in Drosophila (Hock et al., 
2000). E23 is a member of the ABCG subfamily of ABC trans- 
porters, several of which in mammals have been shown to help 
efflux cholesterol and other types of steroids such as estrogens 
and their metabolites (Imai et al., 2003; Janvilisri et al., 2003; 
Klucken et al., 2000; Suzuki et al., 2003; Wang et al., 2004; Yu 
et al., 2005). Based on these previous findings, an in situ hybrid- 
ization screening of ABCG transporter genes in Drosophila 
genome (Figure S3) was conducted to identify putative E trans- 
porters highly expressed in the PG. This resulted in the identifica- 



tion Atet and CG4822, both of which are highly expressed in 
the PG of third instar larvae (Figures 4A and S3). Of these two 
genes, only knockdown oiAtet in the PG showed developmental 
defects indistinguishable from those of knockdown of calcium- 
regulated vesicle exocytosis genes (Figures 4B-4D) and were 
also rescued by E feeding. 

We next expressed a fluorescent protein-tagged Atet in the 
PG to visualize its subcellular localization. This resulted in the 
labeling of both the plasma membrane and Syt1 -positive vesi- 
cles (Figure 4E), suggesting that Atet could indeed be involved 
in the import of E into these vesicles. The signal of fluorescent 
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protein-tagged CG4822, in contrast, did not co-localize with that 
of Atet in the PG (Figure S3E). 

Atet Transports E across Membranes In Vitro in an ATP- 
Dependent Manner 

In order to further examine if Atet is a critical transporter for loading 
E into PG secretory vesicles, we sought to develop an in vitro 
transport assay. We first analyzed the predicted membrane topol- 
ogy of Atet using Phobius, a transmembrane protein topology and 
signal peptide predictor program (Kail et al., 2007). To our surprise, 
Atet was predicted to have an extracellular N terminus, despite the 
fact that its ATP binding domain is on its N-terminal side (Fig- 
ure 5A). The same result was obtained using two other indepen- 
dent algorithms (TMHMM version 2.0 [Krogh et al., 2001] and 
HMMTOP version 2.0 ffusnady and Simon, 2001]), although the 
number of transmembrane domains differed between predic- 
tion algorithms. To determine the actual topology of Atet, we ex- 
pressed an N-terminally HA-tagged Atet in Schneider 2 (S2) cells, 
a cell line derived from a primary culture of late stage Drosophila 
embryos, and immuno-stained the cells in both permeabilized 



and non-permeabilized conditions (Fig- 
ure 5B). Under permeable conditions, 
both the surface of the cells and the inter- 
nal structures were stained, suggesting 
that a certain population of Atet proteins 
are localized on the plasma membrane 
as in the PG cells. Importantly, under 
non-permeable conditions, N-terminal 
staining of Atet was still detected on the 
surface of the cells, whereas the control 
E23 tagged at the intracellular C terminus 
was not detected without permeabilizing 
the cells (Figure 5B). These observations 
demonstrate that the N terminus of Atet 
is indeed located on the non-cytoplasmic 
side of the membranes. 

Based on this atypical membrane topology of Atet, we de- 
signed an in vitro transport assay using S2 cell membrane vesicle 
preparations from cells transfected with Atet (Figure S4). A crude 
membrane preparation typically contains both inside-out and 
right-side-out vesicles. In a regular vesicular ABC transporter 
assay, only the activity of the transporters in the inside-out 
vesicle configuration are detected, since a typical ABC trans- 
porter in the right-side-out vesicles will have its ABC domain 
inside the vesicles and therefore unable to access the exoge- 
nously added ATP and transport substrate. Thus, activity is 
measured as the amount of substrate imported into the vesicles 
(Figure S4). In contrast, in the case of Atet, no net flux into vesi- 
cles would be expected upon addition of exogenous substrates 
since Atet is predicted to pump substrates in the opposite direc- 
tion. Therefore, in our modified procedure, the substrate E was 
preloaded into vesicles during isolation and then ATP was added 
to assess the transporter activity as efflux rather than influx. (Fig- 
ure S4). As shown in Figure 5C, addition of ATP significantly stim- 
ulated efflux of E from Atet containing vesicles while no transport 
was observed using E23, a putative E transporter with a ‘normal’ 



Figure 3. Syt-GFP Reveals the Presence of 
Vesicle-like Structures in the PG 

(A) Representative image of the PG from a wan- 
dering controi iarva overexpressing Syt-GFP 
{phm22>Syt-GFP). The PG is surrounded by a 
dashed iine. Scaie bar, 100 lam. 

(B) Representative image of the PG from a day 7 
(-^150 hr AEL) IP3R RNAi iarva overexpressing 
Syt-GFP {phm22 > Syt-GFP, IP3R RNAi, dicer2). 
The PG is surrounded by a dashed iine. Scaie bar, 
100 i^m. 

(C) Confocai image of the PG from a wandering 
controi iarva overexpressing Syt-GFP. Scaie bar, 
10 |im. 

(D) Confocai image of the PG from a day 7 
(~150 hr AEL) IP3R RNAi iarva overexpressing 
Syt-GFP. Scaie bar, 10 |am. 

(E) Magnified view of the PG ceiis from day7 
(-^150 hr AEL) IP3R RNAi iarvae overexpressing 
Syt-GPF. Note aggregation of many smaii vesicie- 
iike structures aiong the membrane (arrowheads). 
Scaie bars, 10 lam. 
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Figure 4. Atet Is Expressed in the PG and Required for Normal Developmental Progression 

(A) Atet is highly expressed in the PG, as shown by in situ hybridization of the CNS-RG complex from a wandering larva with an Atet antisense probe. The PG 
components of the RG are indicated by arrows, and the CNS is surrounded by a dashed line. Scale bar, 100 |im. 

(B) Afef-knockdown in the PG causes polyphasic growth arrest. Percentages of developmentally arrested larvae for each genotype are shown. Percentages of 
larvae arrested at first or second instar (L1/L2) are indicated in light gray, whereas those arrested at third instar (L3) are shown in dark gray. E feeding is indicated 
by + E. Numbers of animals tested are shown on top of each bar. Inset is a representative image of a wandering control larva (control, phm22 >) and an Atet RNAi 
larvae arrested as L3 (RNAi, phm22 > Atet RNAi). Scale bar, 1mm. 

(C) Afef-knockdown in the PG causes developmental delay. Developmental time to pupation among non-arrested larvae is shown. Data are represented as 
mean ± SEM of three to seven independent experiments. 

(D) Afef-knockdown in the PG leads to the formation of overgrown pupae. Representative image of a control pupa (control; phm22 >), an Afef-knockdown pupa 
(RNAi, phm22 > Atet RNAi) and an Afef-knockdown pupa rescued by E feeding (RNAi + E, phm22 > Atet RNAi with E feeding). Scale bar, 1 mm. 

(E) Sytl and Atet co-localize in the PG. Representative confocal image of the PG from a wandering larva overexpressing Syt-GFP (green) and YPet-Atet 
(magenta). The square area surrounded by a dashed line in the left panel is magnified on the right. Vesicles labeled by both Syt-GFP and YPet-Atet are indicated 
by arrowheads. Scale bar, 10 i^m. 

See also Figure S3. 



membrane configuration with respect to the ABC domain. These 
results demonstrate that Atet can indeed transport E from the 
cytoplasmic to non-cytoplasmic side of vesicle membranes, 
providing strong support for our vesicle-mediated E release 
model (Figure 6). 

DISCUSSION 

In the present study, we provide several lines of evidence 
demonstrating that the insect steroid hormone E is secreted 
from the PG not by simple diffusion, but rather through a cal- 
cium signaling-regulated vesicle fusion event. Below we discuss 
three major points of our findings: (1) Atet, an ABCG transporter, 
can facilitate E passage through membranes in an ATP-depen- 
dent manner, (2) GPCR-regulated calcium signaling in the PG 
promotes E release, and (3) the significance of steroid hormone 
release by vesicle exocytosis and its implication for other steroid 
hormone/cholesterol trafficking processes. 



The ABCG Family Member Atet Is an E Transporter 

Atet was originally cloned in Drosophila as an ABC transporter- 
encoding gene with unknown function. It was found to be highly 
expressed in embryonic trachea, leading to its name ABC trans- 
porter expressed in trachea or Atet (Kuwana et al., 1996). In our 
in situ hybridization experiment, however, we found little expres- 
sion of Afef in embryonic trachea, but instead saw specific high 
level expression in the PG (Figure S3F), consistent with its 
expression pattern in the third instar larva (Figure 4A). Since 
we found that Atet has an atypical membrane topology (Figures 
5A and 5B) and can transport E across membranes in vitro (Fig- 
ure 5C), we propose renaming this ger\e Atypical topology ecdy- 
sone transporter, thereby retaining the Atet gene designation. 

Atet belongs to the ABCG subfamily of ABC transporters, 
members of which in mammals have been shown to transport 
cholesterol as well as other steroids, such as estrogens and their 
metabolites, in many biological systems (Imai et al., 2003; Janvi- 
lisri et al., 2003; Klucken et al., 2000; Suzuki et al., 2003; Wang 
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Figure 5. Atet Is an E Transporter 

(A) Membrane topology of a normal ABC-type transporter (left) and of Atet 
(right) as predicted by membrane topology prediction servers. White boxes 
indicate the ATP bind cassette (ABC). 

(B) Anti-HA staining of S2 cells overexpressing N-terminally tagged Atet 
(HA-Atet) or C-terminally tagged E23 (E23-HA) in permeabilized and non- 
permeabilized conditions. The C terminus of E23 was consistently predicted 
by prediction servers (Phobius, TMHMM and HMMTOP) to be intracellular and 
therefore used as a negative control. 

(C) E transporting activity of Atet and E23 measured by modified vesicular 
transport assay. Data are represented as mean ± SEM of five to six inde- 



et al., 2004; Yu et al., 2005). To our knowledge, however, the 
atypical membrane topology, with the N-terminal ABC domain 
on the non-cytoplasmic side of the membrane, has not been 
reported for any ABC transporter to date. However, this topology 
may have a strong advantage in facilitating tight control on 
E release by preventing Atet from functioning on the plasma 
membrane, due to the lack of ATP in extracellular space. This 
configuration therefore prevents E transport directly through 
the plasma membrane and confines it to a vesicle-mediated 
fusion process, although it requires a separate molecular mech- 
anism to transport ATP into the secretory vesicles. This mecha- 
nism remains unclear at this point, but it may involve a specific 
transporter like the recently described VNUT/SLC1 7A9 (Sawada 
et al., 2008). In this context, it is interesting to note that the 
human Atet orthologs ABCG1 and ABCG4 (Figure S3A) are 
also strongly predicted by membrane topology algorithms to po- 
sition their N-terminal ABC domain on the non-cytoplasmic side. 
These transporters mediate cellular cholesterol efflux (Wang 
et al., 2004) and have recently been shown to work not on the 
plasma membrane but in intracellular endosomes (Tarling and 
Edwards, 2011). Clearly, additional studies on the membrane to- 
pology of ABCG transporters are warranted. 

Separate Signaling Pathways Likely Regulate E 
Production and Release 

The results of our RNAi screening (Table 1) demonstrate that 
CG30054, a Gaq subunit, and Plc21C, a PLCp class enzyme, 
are both required for proper PG function. These findings strongly 
implicate the existence of an unknown GPCR and cognate ligand 
as mediators of the calcium signaling event that we suggest 
stimulates E release from the PG. On the other hand, we know 
that the PTTH receptor is Torso, a receptor tyrosine kinase (Re- 
witz et al., 2009) and its primary role is to promote E production 
by inducing E biosynthetic enzyme gene transcription. These ob- 
servations suggest that, at least in Drosophila, E production and 
release are likely regulated separately. This machinery might 
help the GPCR ligand to generate large pulses of steroid in a 
timely fashion. The identification of the GPCR as well as its ligand 
is necessary to further pursue this possibility. 

Significance of Vesicle-Mediated E Release and Its 
Implication for Other Processes 

The mechanism of steroid hormone transit through lipid mem- 
branes has not been well studied and in many physiology 
textbooks the issue is not even discussed. When this topic is 
mentioned, the explanation most often given is that they can 
freely diffuse through lipid membranes (Raven and Johnson, 
2002; Sherwood, 2011; White and Porterfield, 2012). Despite 
this prevailing assumption, there are only a few reports where 
such transbilayer transfer of steroids by free diffusion has 
been analyzed. In one theoretical study, it was shown in silico 
that a free energy of solvation-based mechanism can produce 
rapid flux of estradiol, testosterone, and progesterone through 



pendent experiments. **p < 0.01 compared to control (0% transport) from 
Student’s t test. 

See also Figure S4 for the details of the modified vesicular transport assay. 
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Figure 6. Schematic Illustration of the 
Vesicle-Mediated E Release Model 

Torso is shown in red, an unknown GPCR is shown 
in pink, and Atet is shown in orange. E is depicted 
as purpie fiiied circies. ER, endopiasmic reticuium; 
M, mitochondria. 



one, testosterone, and estradiol are signif- 
icantly more hydrophobic than E. There- 
fore, the free energy of solvation into a 
lipid bilayer of E is likely to be much more 
positive than for sex steroids; this may 
preclude the use of a simple diffusion 
mechanism for E. In this respect, E is 
more similar to bile acids, which are also 
highly hydrophilic and need active trans- 
porters to traverse lipid bilayers (Dawson 
et al., 2009). Thus, depending on their spe- 
cific physiochemical properties, different 
steroids might use either simple passive 
diffusion through the plasma membrane, 
active transporters or some combination 
of these mechanisms. 



a simple membrane in concordance with measured rates (Oren 
et al., 2004). However, it is well known that steroid hormone 
transport across membranes can indeed be an active process 
in some situations: there are a number of reports on transporter 
involvement in either uptake or elimination of steroid hormones 
in eukaryotes ranging from yeast to human (Hock et al., 2000; 
Janvilisri et al., 2003; Kralli et al., 1995; Kralli and Yamamoto, 
1996; Mahe et al., 1996). These reports are suggestive enough 
to rationalize a potential mechanism that incorporates steroid 
hormones into secretory vesicles, which enables regulated 
secretion of steroid hormones from steroidogenic tissues. 

Historically, the possibility of vesicle-mediated steroid 
hormone release has been examined using ultrastructural 
and biochemical approaches in multiple biological systems, 
including the corpus luteum in sheep (Gemmell and Stacy, 
1979; Gemmell et al., 1974; Higuchi et al., 1976; Sawyer 
et al., 1979). The proposed vesicle-mediated progesterone 
release from the sheep corpus luteum, however, was later 
challenged, since the peptide oxytocin was shown to be 
present in dense granules by immuno-EM methods (Theodosis 
et al., 1986) and release of oxytocin and progesterone re- 
sponded differently to various secretagogues (Hirst et al., 
1986). Since that time, studies investigating the possibility 
of vesicle-mediated steroid release in any biological system 
have rarely been reported. One relevant and intriguing set of 
studies, however, involved ultrastructural localization of E in 
the PG of the waxworm Galleria mellonella (Birkenbeil, 1983; 
Birkenbeil et al., 1979) using immuno-EM methods. These 
studies suggested that E in the PG is concentrated into what 
appear to be secretory granules that fuse with the plasma 
membrane, but once again no follow up studies have been 
reported in the literature. 

In considering the various models for steroid passage through 
membranes, it is important to note that steroids such as progester- 



In summary, our work provides strong evidence that E is 
released from the PG by calcium-stimulated, vesicle-mediated 
exocytosis. Therefore, we suggest that the prevailing “free diffu- 
sion” model of steroid hormone secretion needs to be reconsid- 
ered. It also follows that if E uses an active export process, then 
the import of many hormones, in particular 20E, is also likely 
controlled by transporters. Given the diversity of physiological 
processes regulated by steroid hormones, additional character- 
ization of the mechanisms responsible for their import and export 
from various cell types and tissues will have significant impact on 
both basic and clinical aspects of steroid hormone physiology. 

EXPERIMENTAL PROCEDURES 
Fly Stocks 

All flies were raised at 25°C on standard medium under 12 hr/12 hr light/dark 
cycle. Aside from the control strains of yw and w, transgenic flies used in the 
figures are as follows: phm22-Gal4 (Ou et al., 2011), UAS-IP3R RNAi (#6484, 
VDRC), UAS-dicer2 (#60008 and #60009, VDRC), UAS-Ras^^^ (#4847, 
BDSC), UAS-CG34349 RNAi (#31571, VDRC), UAS-Rab3 RNAi (100787, 
VDRC), UAS-Sytl RNAi (#100608, VDRC), UAS-GCaMP5 (#42038, BDSC), 
UAS-gbb RNAi (#5562R-1, NIG-FLY) UAS-Gaq[Q203L] (#30743, BDSC), 
UAS-Syt-GFP (#6926, BDSC), U AS- Atet RNAi (#42750, VDRC), UAS-SedO 
RNAi (a gift from K. Broadie) (Andrews et al., 2002), OK72-Gai4 (#6486, 
BDSC), and UAS-CG4822 RNAi (#42730 and #105922, VDRC). All the other 
RNAi lines and sources used in this study are shown in Table 1 . cDNA clones 
RE01860 and SD07027 from Drosophiia Genomics Resource Center were 
used to tag Atet and CG4822 with YPet or CyPet at N termini using the recom- 
bineering-mediated method (Venken et al., 2008). These products were then 
cloned into pUAST and transgenic flies were generated by BestGene. 

Pupariation Timing and Developmentai Arrest Phenotype Analyses 

Synchronized newly hatched first instar larvae were placed on standard medium 
at 25°C, 25-30 larvae per vial, and pupariation timing of each individual was 
scored periodically. Developmentally arrested larvae were scored individually 
by checking their body and mouth hook characteristics. E feeding rescue was 
performed by adding 0.2 mg/ml (final concentration) of E into the medium. Larvae 
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were transferred into E-free medium during the eariy third instar stage (80-1 04 hr 
AEL) to avoid the potentiai detrimentai effect of continuous E feeding. 

Extraction of Ecdysteroids from CNS-RG Complexes and 
Hemolymph 

Ten CNS-RG compiexes were quickiy dissected from iarvae in PBS, briefly 
rinsed with PBS, and pooled in 300 i^l of methanol on ice. The complexes 
were thoroughly homogenized by repeatedly passing through 23 gauge 
needles. After centrifugation at 4°C for 5 min, the supernatant was pooled 
on ice, while the pellet was re-extracted. The resulting extract (= 1 batch) 
was stored at -20°C until use. For hemolymph samples, 4 ^il of hemolymph 
was collected from 10-20 larvae and mixed with 100 ^il of methanol on ice. 
After vortexing, samples were centrifuged at 4°C for 5 min, and the resulting 
supernatant (= 1 batch) was stored at -20°C until use. 

Separation of E and 20E Using HPLC 

For each HPLC experiment, three batches of CNS-RG extract or one batch of 
hemolymph sample was partially evaporated with a SpeedVac concentrator 
and diluted with water to make the methanol concentration 30% or lower. 
After centrifugation at room temperature for 10 min to remove precipitates, 
the aqueous solution was applied onto a Vydac 218TP Cl 8 column (4.6 x 
250 mm; Grace) using Amersham Biosciences P-900 pump. The elution was 
performed with a linear gradient of 30%-50% methanol over 20 min at the 
flow rate of 1 ml/min. The fractions were collected every 30 s, and the amount 
of ecdysteroids in each fraction was determined by ecdysteroid ELISA using 
half the amount (250 |il) of each fraction. The residual half of each fraction cor- 
responding to E (42.5%-44.5% methanol = 1 ml) and 20E (36%-38% meth- 
anol = 1 ml) were pooled and stored at -20°C until ecdysteroid quantification. 
This experiment was repeated three times for each biological sample. 

Ecdysteroid ELISA 

The sample solutions were dried with a SpeedVac concentrator and dissolved 
in EIA buffer (100 mM phosphate solution [pH 7.4], containing 0.1% BSA, 
400 mM NaCI, 1 mM EDTA, and 0.01% NaNg). 20E AChE tracer (#482200), 
20E EIA antiserum (#482202), precoated (mouse anti-rabbit IgG) EIA 96-well 
plates (#400007), and Ellman’s reagent (#400050) were all purchased from 
Cayman Chemical. The assay was performed according to the manufacturer’s 
instructions using synthetic E or 20E (Sigma-Aldrich) as standards. 

GCaMPS Imaging and Data Anaiysis 

Wandering third instar larvae were pinned down to a Sylgard dish and 
dissected in HL3 saline (2.5 mM Ca^^) along the dorsal midline. All tissues 
with the exception of imaginal discs, CNS-RG complex and body wall muscu- 
lature were removed. For the quantification of the PGs that presented calcium 
spikes (Figure 2F), the samples were treated with 1 laM tetrodotoxin in HL3 sa- 
line to block muscle contraction and stabilize the preparation. The samples 
were imaged in a Nikon FN1 microscope equipped with an AIR confocal 
scan head and a CFI75 Apo LWD 25x 1.1 NA water dipping objective. The 
PG was located using transmitted light and the zoom level was adjusted to 
include the whole PG within the field of view, which resulted in the pixel size 
of 0.25-0.33 |im. GCaMP5 was imaged with a 488 nm laser at constant power 
for all samples with the emission window of 500-550 nm. The experimental an- 
imals were coded so the imager didn’t know the genotypes analyzed. The 
time-lapse runs were analyzed for the presence of macro and micro spikes 
in calcium levels (GCaMP5 intensity) within the PG cells before decoding the 
genotypes. All image processing and analysis was performed with the Nikon 
NIS-Elements software package. 

Fluorescence Microscopy 

To avoid disruption of vesicle structures by sample fixation, the PGs were 
dissected and mounted in Schneider’s insect medium (Sigma-Aldrich) and 
imaged immediately with fluorescence microscopy. All confocal images 
were acquired using a Zeiss LSM 710 (Carl Zeiss). A lambda scan was per- 
formed and the signals of Syt-GFP and YPet-Atet were linearly unmixed using 
ZEN 2009 software. For observation of the entire PG in Figures 3A and 3B, 
Zeiss Axio Imager M2 equipped with ApoTome.2 and Plan-Apochromat 20x 
0.8 NA objective (Carl Zeiss) was used. 



In Situ Hybridization 

In situ hybridization with DIG-labeled RNA antisense probe was performed as 
previously described (Chavez et al., 2000). cDNA clones used to generate anti- 
sense probes are shown in Figure S3A. 

Immunocytochemistry 

cDNA clones RE01860 and RE53253 from Drosophila Genomics Resource 
Center were used, respectively, to generate HA-tagged Atet and E23 con- 
structs. A nonsense mutation found in the E23 clone RE53253 (bp 2766) 
was corrected by site-directed mutagenesis (QuikChange Kit; Agilent Tech- 
nologies) using following primers: 5'-GGCCCAGCACCTGGTGTGGTGTGCC 
GCGGACTCGCAGTCC-3'; 5'-GGACTGCGAGTCCGCGGCACACCACACCA 
GGTGCTGGGCC-3L An Xba\ site was introduced downstream of the start 
codon (Atet) or upstream of the stop codon (E23) and was used to insert a triple 
HA epitope. These products were then cloned into pBRAcpA expression 
vector for transient transfection into S2 cells (Rewitz et al., 2009). After trans- 
fection, cells were grown in serum-free M3 medium (Sigma-Aldrich) for 4 days 
and attached to concanavalin A-coated slides overnight at 25°C. Membrane 
permeabilization was performed with 0.1% Triton X-100 in PBS for 15 min at 
room temperature. Cells were stained with rat monoclonal anti-HA antibody 
3F10 (1 :500; Roche Applied Science) followed by Alexa Fluor 488 Goat Anti- 
Rat IgG antibody (1 :1 ,000; Molecular Probes) pre-absorbed with S2 cells over- 
night at 4°C. Cells were briefly treated with PBS containing DAPI before 
mounted in Vectashield (Vector Laboratories). Images were acquired using a 
Zeiss LSM 710 equipped with C-Apochromat 40x 1.2 NA water immersion 
and alpha Plan-Apochromat lOOx 1.46 NA oil immersion objectives (Carl 
Zeiss). 

Modified Vesicular Transport Assay 

S2 cells were grown in 10-cm Petri dishes in serum-free M3 medium (Sigma- 
Aldrich) for 4 days after transfection with pBRAcpA empty vector (control), 
pBRAcpA-Atet, or pBRAcpA-E23-HA. After pelleting by brief centrifugation 
at room temperature, the cells from two dishes were suspended in 4 ml 
ice-cold extraction buffer (50 mM HEPES-KOH [pH 7.5], containing 400 mM 
sucrose, supplemented with complete. Mini, EDTA-free protease inhibitor 
cocktail, Roche Applied Science) and sonicated for 30 s with a probe sonicator 
two times on ice. The cell lysates were then centrifuged at 1 ,000 x g for 5 min 
at 4°C to pellet cell debris, and the resultant supernatant was ultracentrifuged 
at 108,000 X g for 30 min at 4°C. The pellet (membrane fraction) was sus- 
pended in 1.6 ml Assay buffer (50 mM HEPES-KOH [pH 7.5], containing 
10 mM MgCl 2 ) and sonicated for 30 s with a probe sonicator on ice. The sus- 
pension was then immediately aliquoted (400 [i\ each) into four 1 .5-ml tubes 
containing 50 |il E solution (1 00 |ag/ml in Assay buffer) to incorporate E into ves- 
icles. After 1 5 min, 50 |al of ATP solution (Sigma-Aldrich, 1 0 mM in Assay buffer) 
was added to each tube to give a final concentration of 1 mM, and the transport 
assay was initiated. After 1 hr incubation at 25°C, the vesicles were collected 
by filtrating through Whatman GF/A Glass microfiber filters (GE Healthcare). 
After washing three times with 1 ml 50 mM HEPES-KOH [pH 7.5], the filters 
were vigorously vortexed in 800 [i\ of methanol to extract E, and the E amount 
was determined by ecdysteroid ELISA. 
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SUMMARY 

A long-standing question concerns how stem cells 
maintain their identity through multiple divisions. 
Previously, we reported that pre-existing and newly 
synthesized histone H3 are asymmetrically distrib- 
uted during Drosophila male germline stem cell 
(GSC) asymmetric division. Here, we show that phos- 
phorylation at threonine 3 of H3 (H3T3P) distin- 
guishes pre-existing versus newly synthesized H3. 
Converting T3 to the unphosphorylatable residue 
alanine (H3T3A) or to the phosphomimetic aspartate 
(H3T3D) disrupts asymmetric H3 inheritance. 
Expression of H3T3A or H3T3D specifically in early- 
stage germline also leads to cellular defects, 
including GSC loss and germline tumors. Finally, 
compromising the activity of the H3T3 kinase Haspin 
enhances the H3T3A but suppresses the H3T3D phe- 
notypes. These studies demonstrate that H3T3P dis- 
tinguishes sister chromatids enriched with distinct 
pools of H3 in order to coordinate asymmetric segre- 
gation of “old” H3 into GSCs and that tight regulation 
of H3T3 phosphorylation is required for male germ- 
line activity. 

INTRODUCTION 

Epigenetic phenomena are heritable changes in gene expression 
or function that can persist throughout many cell divisions without 
alterations in primary DNA sequences. By regulating differential 
gene expression, epigenetic processes are able to direct cells 
with identical genomes to become distinct cell types in humans 
and other multicellular organisms. However, with the exception 
of DNA methylation, little is known about the molecular pathways 
leading to epigenetic inheritance (Bonasio et al., 2010; Martin and 
Zhang, 2007). 

Prior research has shown that epigenetic events play particu- 
larly important roles in ensuring both proper maintenance and 
differentiation of several stem cell populations. Many types of 



adult stem cells undergo asymmetric cell division to generate a 
self-renewed stem cell and a daughter cell that will subsequently 
differentiate (Betschinger and Knoblich, 2004; Clevers, 2005; In- 
aba and Yamashita, 2012; Morrison and Kimble, 2006). Mis- 
regulation of this balance leads to many human diseases, 
ranging from cancer to tissue dystrophy to infertility. However, 
the mechanisms of stem cell epigenetic memory maintenance 
as well as how loss of this memory contributes to disease remain 
unknown. 

Recently, we found that during the asymmetric division of the 
Drosophila male germline stem cell (GSC), the pre-existing his- 
tone 3 (H3) is selectively segregated to the self-renewed GSC 
daughter cell whereas newly synthesized H3 is enriched in the 
differentiating daughter cell known as a gonialblast (GB) (Tran 
et al., 2012) (Figure 1A). In contrast, the histone variant H3.3, 
which is incorporated in a replication-independent manner, 
does not exhibit such an asymmetric pattern. Furthermore, we 
found that asymmetric H3 inheritance occurs specifically in 
asymmetrically dividing GSCs, but not in the symmetrically 
dividing progenitor cells. These findings demonstrate that global 
asymmetric H3 histone inheritance possesses both molecular 
and cellular specificity. We proposed the following model to 
explain our findings. 

First, the cellular specificity exhibited by the H3 histone sug- 
gests that global asymmetric histone inheritance occurs 
uniquely in a cell-type (GSC) where the mother cell must divide 
to produce two daughter cells each with a unique cell fate. 
Because this asymmetry is not observed in symmetrically 
dividing GB cells, we propose asymmetric histone inheritance 
to be a phenomenon specifically employed by GSCs to establish 
unique epigenetic identities in each of the two daughter cells. 
Second, as stated previously, a major difference between H3 
and H3.3 is that H3 is incorporated to chromatin during DNA 
replication, while H3.3 variant is incorporated in a replication-in- 
dependent manner. Because this asymmetric inheritance mode 
is specific to H3, we propose a two-step model to explain asym- 
metric H3 inheritance: (1) prior to mitosis, pre-existing and newly 
synthesized H3 are differentially distributed on the two sets of 
sister chromatids, and (2) during mitosis, the set of sister chro- 
matids containing pre-existing H3 is segregated to GSCs, while 
the set of sister chromatids enriched with newly synthesized H3 
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Figure 1. H3T3P Distinguishes Pre-existing H3-GFP from Newly Synthesized H3-mKO in Mitotic Male GSCs 

(A) A visual representation of the Drosophila testis tip showing the asymmetric H3 inheritance during male GSC asymmetric cell division. 

(B) A schematic diagram of a two-step model to explain how the asymmetric epigenome is established during S-phase (step one) and recognized followed by 
asymmetric segregation in M-phase (step two) GSC, adapted from Tran et al. (2013). 

(C-E) A prophase GSC where GFP and mKO signals are separable. 

(F-H) A prophase GB where GFP and mKO signals are overlapping. 

(I-N) A prophase GSC where GFP and mKO signals are separable at some chromosomal region (I, L, and M). Immunostaining using anti-H3T3P (N) showed 
H3T3P co-localization more with GFP (J, L, and N) than with mKO (K, M, and N). 

(0-T) A prophase GB where GFP and mKO signals are overlapping (O, R, and S) and no preference of H3T3P (T) with either GFP (P and R) or mKO (Q and S). 
(U-Z) A metaphase GSC where GFP and mKO signals are indistinguishable (U, X, and Y), H3T3P (Z) overlaps with both GFP (V and X) and mKO (W and Y). 
Asterisks in (0), (I), (L), (U), and (X), hub. Scale bars, 5 |im. 

See also Figure SI . 



is segregated to the GB that differentiates (Tran et al., 2012, 
2013) (Figure 1B). 

RESULTS 

H3T3P Distinguishes Pre-existing H3 and Newly 
Synthesized H3 in Mitotic Male GSCs 

To test our proposed two-step model, we used a temporally 
controlled dual-color system to precisely label pre-existing 
H3 with GFP and newly synthesized H3 with monomeric Kusa- 
bira-Orange (mKO) (Tran et al., 2012). Asymmetric segregation 
of H3-GFP and H3-mKO was clearly visualized in anaphase 
and telophase GSCs imaged during the second mitosis 
following heat-shock-induced switch from H3-GFP- to H3- 
mKO-coding sequence (Tran et al., 2012). Here, we show 
that H3-GFP and H3-mKO signals are already separable at 
some chromosomal region in prophase GSCs (Figures 1C- 
1E), likely in regions with less tight cohesion between sister 
chromatids. Such a separation was not detected in a control 
prophase GB (Figures 1F-1H). These results are consistent 
with the hypothesis that the differential distribution between 



pre-existing H3-GFP and newly synthesized H3-mKC is estab- 
lished prior to mitosis in GSCs (Figure IB, step one). By 
contrast, such a separation was not detected using a H3.3 
dual-color transgene under the same heat-shock regime (Fig- 
ure S1A), consistent with our previous report that H3.3 is in- 
herited symmetrically (Tran et al., 2012). 

When immunostaining experiments were performed using an 
antibody recognizing a mitosis-enriched phosphorylation at 
threonine 3 of H3 (H3T3P) (Dai et al., 2005; Polioudaki et al., 
2004), the H3T3P signal (Figures 1J, 1K, and 1N) showed more 
co-localization with H3-GFP (Figures 1 1 and 1 L) than with H3- 
mKC (Figures 1 1 and 1 M) in prophase GSCs where separation 
between H3-GFP and H3-mKC could be visualized (Figures 11, 
1L, and 1M). By contrast, H3-GFP signals and H3-mKC signals 
were not separable in prophase GBs, (Figures 1C, 1R, and 1S) 
and H3T3P did not distinguish between them (Figures 1P, 10, 
and 1T). Furthermore, when sister chromatids congressed to 
the equator in metaphase GSCs, such a distinction became un- 
detectable (Figures 1U-1Z), suggesting that H3T3P distin- 
guishes sister chromatids enriched with pre-existing H3 from 
those enriched with newly synthesized H3 in prophase GSCs. 
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Figure 2. Expression of an H3T3A Transgene Greatly Reduces H3T3P in Mitotic Germ Cells 

(A-D) Tip of a testis expressing nos>H3T3A-GFP stained with antibodies against a hub marker Fasiii, H3T3P, and H3S10P. A prophase germ ceii (yeiiow dotted 
outiine) expressing H3T3A-GFP (A) is iack of H3T3P (B) but has abundant H3S1 OP (C) that co-iocaiizes with condensed chromosome iabeied by Hoechst staining 
(D). Two mitotic CySCs (white dotted outiine) without H3T3A-GFP (A) has both H3T3P (B) and H3S10P (C) signais co-iocaiized with condensed chromosome 
iabeied by Hoechst staining (D). Asterisks, hub. 

(E-H) Mitotic germ ceiis (cyan dotted outiine) expressing nos>H3-GFP (E) as a controi have both H3T3P (F) and H3S10P (G) signais co-iocaiized with condensed 
chromosome iabeied by Hoechst staining (H). Scaie bars, 5 i^m. 

See aiso Figure S2. 



Consistent with this potential function of H3T3P, immunostain- 
ing signals of H3T3P were only detectable in prophase (Figures 
1N and S1D) to metaphase (Figures 1Z, and S1D), but not in 
late anaphase (Figure S1 D) GSCs. By contrast, immunostaining 
using an antibody against another mitosis-enriched H3S10P 
(phosphorylation at serine 10 of H3) showed abundant signal 
throughout mitosis (Figure S1 D). Furthermore, the signal from 
H3T3P immunostaining (Figure S1E) was enriched, but not 
restricted, to the centromeric region labeled with an antibody 
against a centromere-specific H3 variant centromere identifier 
(Cid) (Figure S1 E). In summary, the temporal and spatial distribu- 
tions of H3T3P in Drosophila male germ cells are comparable to 
what has been reported in other cell types from other systems 
(Caperta et al., 2008; Dai et al., 2005; Escriba and Goday, 
2013; Markaki et al., 2009; Wang et al., 2010). 

Expression of an H3T3A Transgene Greatly Reduces 
H3T3P in Mitotic Germ Cells 

To understand the function of H3T3P in male germ cells, we 
generated fly lines with an H3-GFP transgene carrying a point 
mutation that converts T3 to the unphosphorylatable alanine 
(Ala or A, H3T3A). Expression of the H3T3A-GFP transgene in 
early germ cells by the nanos-Gal4 (nos-Gal4) driver (Van Doren 
et al., 1998) greatly reduced the H3T3P signal (yellow versus 
white outlined cells in Figures 2A and 2B). This reduction of im- 
munostaining signal was specific to H3T3P, as immunostaining 
using anti-H3S10P showed normal signals in H3T3A-expressing 
cells (yellow versus white outlined cells in Figure 2C). As a con- 
trol, expression of the wild-type H3-GFP had no effect on either 
H3T3P (Figures 2E and 2F) or H3S10P (Figure 2G) signals. 



Because endogenous H3 is still abundant in testes in which 
early germ cells are enriched with nos-driving H3T3A expression 
(Figures S2A and S2B), the absence of H3T3P signal suggests a 
dominant negative effect of H3T3A. The dominant negative effect 
of point mutations of H3 has recently been observed with several 
residues of histone H3 (Herz et al., 2014; Lewis et al., 2013). 

Expression of H3T3A Changes the Asymmetric H3 
Segregation Pattern in Mitotic GSCs 

Because expression of the H3T3A provides a loss-of-function 
condition for H3T3P (Figures 2, S2C, and S2D), we next explored 
whether asymmetric histone segregation is affected in H3T3A- 
expressing GSCs using the dual-color labeling strategy (Fig- 
ure 3A). As a control, we used a similar system with wild-type 
H3 and found that pre-existing H3-GFP and newly synthesized 
H3-mKO are asymmetrically segregated in telophase GSCs dur- 
ing the second mitosis after heat-shock-induced genetic switch 
(Figures 3B-3D), consistent with our previous report (Tran et al., 
2012). By contrast, we found a dramatic shift in histone inheri- 
tance patterns from predominantly asymmetric to predominantly 
symmetric pattern (Figures 3H-3J), using the dual-color trans- 
gene with H3T3A (Figure 3A). Although the majority of GSCs 
expressing H3T3A exhibited a symmetric pattern of histone in- 
heritance (Figures 3H-3J), we could still detect the conventional 
asymmetric pattern resembling that of wild-type H3 in telophase 
GSCs (Figures 3E-3G). Surprisingly, we also observed the in- 
verted asymmetric pattern (Figures 3K-3M). 

We reason that if pre-existing and newly synthesized histones 
are randomly incorporated during the first step (Figure 1 B), no 
separation between GFP and mKO signals should be detectable 
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Figure 3. Expression of H3T3A Changes 
the Asymmetric H3 Segregation Pattern in 
Mitotic GSCs 

(A) A schematic diagram showing the duai coior- 
switch design that expresses pre-existing H3T3A- 
GFP and newiy synthesized H3T3A-mKO by 
heat-shock treatment, as adapted from Tran et al. 
( 2012 ). 

(B-D) A telophase GSC expressing nos>FRT-H3- 
GFP-PolyA-FRT-H3-mKO-PolyA {nos>H3) during 
the second mitosis after heat-shock-induced 
genetic switch show conventional asymmetric 
segregation pattern. 

(E-M) Telophase GSCs expressing nos> 
FRT-H3T3A-GFP-PolyA-FRT-H3T3A-mKO-PolyA 
{nos>H3T3A) during the second mitosis after 
heat-shock-induced genetic switch show conven- 
tional asymmetric segregation pattern (E-G), sym- 
metric pattern (H^), or inverted asymmetric pattern 
(K-M). 

(N-P) A prophase GSC expressing nos> 
FRT-H3T3A-GFP-PolyA-FRT-H3T3A-mKO-PolyA 
{nos>H3T3A) during the second mitosis after heat- 
shock-induced genetic switch show separable GFP 
and mKO signals. Asterisk, hub; white dotted 
outline, mitotic GSCs at telophase (B-M) or pro- 
phase (N-P); arrowheads, interphase GSCs or GBs 
that show much less condensed nuclei. Scale bars, 
5 ]xm. 
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Figure 4. Expression of H3T3A or H3T3D 
Changes Pre-existing and Newly Synthe- 
sized H3 Distribution Patterns in Post- 
Mitotic GSC-GB Pairs 

(A-l) Immunostaining signals using antibodies 
against a hub marker Faslll and spectrosome/ 
fusome marker a-spectrin in testes from 
nos>FRT-H3-GFP-PolyA-FRT-H3-mKO-PolyA 
{nos>H3, A-C) or nos>FRT-H3T3A-GFP-PolyA- 
FRT-H3T3A-mKO-PolyA {nos>H3T3A, D-l) males 
after the second mitosis upon heat-shock-in- 
duced genetic switch. Asterisk, hub; white dotted 
outline, post-mitotic GSC-GB pairs; arrowheads, 
spectrosome structure in between GSC and GB 
cells. Scale bars, 5 i^m. 

(J) Quantification of the ratio of GFP (y axis: log 2 
scale) fluorescence intensity in GSC-GB pairs (see 
Figures S3A, S3B, and Table SI for details): 
nos>H3 (open circle, n = 55), nos>H3T3A (solid 
triangle, n = 64), and nos>H3T3D (open square, 
n = 57). Red dotted outline delineates symmetric 
distribution zone (see explanations below). H3 (n = 
55): GSC/GB GFP ratio = 10.11 + 1.66 (p < 10“^ 
for the ratio >1 , one-tailed t test). H3T3A (n = 64): 
GSC/GB GFP ratio = 1.50 + 0.28 (p > 0.05 
therefore is insignificantly different from 1, two- 
tailed t test). H3T3D (n = 57): GSC/GB GFP ratio = 
1.56 ± 0.51 (p > 0.05 therefore is insignificantly 
different from 1, two-tailed t test). All ratios = 
Avg ± SE; p value, one sample t test. 

(K) Percentage of GSC-GB pairs with conventional 
asymmetric (GFP in GSC/GB >1.55), symmetric 
(GSC/GB GFP ratio between 1-1 .45 and GB/GSC 
GFP ratio between 1-1.45), inverted asymmetric 
(GFP in GB/GSC >1.55), and borderline (GSC/GB 
GFP ratio between 1 .45-1 .55 and GB/GSC GFP 
ratio between 1 .45-1 .55) patterns, respectively in 
nos>H3, nos>H3T3A, and nos>H3T3D testes, as 
well as the predicted patterns according to ran- 
domized segregation modeling (Table S2). In 
nos>H3 testes, conventional asymmetric: 87.3% 
(48/55); symmetric: 12.7% (7/55); no inverted 

asymmetric or borderline pairs. In nos>H3T3A testes, conventional asymmetric: 9.4% (6/64); symmetric: 71.9% (46/64); inverted asymmetric: 12.5% (8/64); 
borderline: 6.3% (4/64). In nos>H3T3D testes, conventional asymmetric: 7.0% (4/57); symmetric: 79.0% (45/57); inverted asymmetric: 10.5% (6/57); borderline: 
3.5% (2/57). Predicted patterns: conventional asymmetric: 18.7% (12/64); symmetric: 53.1 % (34/64); inverted asymmetric: 18.7% (12/64); borderline: 9.4% (6/64). 
See also Figures S3A and S3B and Tables SI and S2. 
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during GSC asymmetric division. The fact that we could still iden- 
tify conventional and inverted asymmetric segregation patterns 
in telophase GSCs (Figures 3E-3G and 3K-3M) suggests that 
the establishment of histone asymmetry prior to mitosis may 
not be affected. The observed defects in proper asymmetric 
segregation therefore arise upon mitotic entry when sister chro- 
matids containing different populations of H3 need to be recog- 
nized and segregated to the appropriate daughter cell (Figure 1 B, 
step two). Consistent with this hypothesis, separable H3T3A- 
GFP and H3T3A-mKO could still be detected in prophase 
GSCs (Figures 3N-3P and S1B), but not in a control prophase 
GB (Figure S1C). 

Expression of H3T3A Changes H3 Distribution Patterns 
in Post- Mitotic GSC-GB Pairs 

Since mitotic GSCs account for <2% among all GSCs (Sheng 
and Matunis, 201 1 ; Yadlapalli et al., 201 1 ; Yadlapalli and Yama- 



shita, 2013), we next examined post-mitotic GSC-GB pairs 
derived from GSC asymmetric divisions to quantify histone in- 
heritance patterns (Tran et al., 2012) (Experimental Procedures). 

In contrast to the conventional asymmetric distribution pattern 
in wild-type H3-expressing GSC-GB pair (Figures 4A-4C), we 
observed symmetric (Figures 4D-4F), conventional asymmetric 
(left pair in Figures 4G-4I), and inverted asymmetric (right pair 
in Figures 4G-4I) distribution patterns in post-mitotic GSC-GB 
pairs. These data are consistent with what we have observed 
with mitotic GSCs (Figures 3E-3M). 

Next, we quantified the percentage of each of these distribu- 
tion patterns. We mainly used GFP signal to account for different 
patterns, for example, in Figure 4J: the conventional asymmetric 
patterns are in zone I, with GFP ratio in GSC/GB >1 .55; the sym- 
metric patterns are in zone II, with GFP ratio in GSC/GB <1.45 
but >0.69 (i.e., GB/GSC <1.45); and the inverted asymmetric 
patterns are in zone III, with GFP ratio in GB/GSC >1.55. 
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Figure 5. Both Germline and Somatic Gonadal Cells Show Defects in nos>H3T3A or nos>H3T3D Testes 

(A-L) Immunostaining using antibodies against a hub marker Fasiii and spectrosome/fusome marker a-spectrin in testes from nos>H3-GFP (A-D), nos>H3T3A- 
GFP (E-H), or nos>H3T3D-GFP (i-L) maies 7 days after eciosion. Asterisks in (A), (E), and (i), hub; arrowheads in (C), (G), and (K) point to the hub region, which are 
shown with higher magnification in insets: hub size increases in nos>H3T3A-GFP (inset in G) or nos>H3T3D-GFP (inset in K) testes, but not in nos>H3-GFP (inset 
in C) testes. Eariy-stage germ ceiis, as determined by nos-driven GFP expression (B, F, and J), and nuciear morphoiogy (Chen et ai., 2013; Tran et ai., 2000) are 
deiineated by the yeiiow dotted iines in (D), (H), and (L). Scaie bars, 20 i^m. 



(legend continued on next page) 
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The ~1 .5-fold cutoff is based on the quantification range of 
symmetric H3 distribution in spermatogonial cells and symmetric 
H3.3 distribution in GSC-GB pairs (Tran et al., 2012, 2013). 
We reasoned that GFP ratio reflects the establishment of 
asymmetric histone distribution on sister chromatids more reli- 
ably than mKO ratio for two reasons. First, when we measured 
mKO fluorescence intensity in post-mitotic GSC-GB pairs, 
both cells are actively undergoing S phase for the next mitosis 
and exhibit robust incorporation of mKO-labeled newly synthe- 
sized histones (Figure 4C). Second, any histone turn-over that in- 
corporates newly synthesized mKO-labeled histones (Deal et al., 
2010; Dion et al., 2007) during processes such as transcription 
may not be sister chromatid-specific. 

When we quantified the GFP distribution patterns in post- 
mitotic GSC-GB pairs in H3T3A-expressing testes (Figures 4J 
and S3A), we found that 71 .9% (46/64) of pairs showed a sym- 
metric pattern of inheritance (Figure 4K; Table S1). By contrast, 
in wild-type H3-expressing testes, 87.3% (48/55) of pairs 
showed an asymmetric pattern of inheritance (Figure 4K; Table 
S1). Moreover, in H3T3A-expressing testes, asymmetric pat- 
terns could be observed in two distinct modes at lower fre- 
quencies: 9.4% (6/64) conventional asymmetry, 12.5% (8/64) in- 
verted asymmetry, and 6.3% (4/64) at the borderline (1 .45- to 
1.55-fold) between asymmetry and symmetry (Figure 4K; Table 
S1). Noticeably, no GSC-GB pair showed the inverted asym- 
metric pattern (zone III in Figure 4J) in wild-type H3-expressing 
testes (Figures 4J and 4K), suggesting that such a pattern is spe- 
cifically induced by H3T3A-expression. 

Expression of H3T3A Causes Several Germline Defects 

A spectrum of cellular defects could be detected in nos>H3T3A 
testes after the level of H3T3P is effectively reduced (Figures 
S2C and S2D). Compared to testes expressing the wild-type H3 
(Figures 5A-5D, S4A, S4D, and S5A), H3T3A-expressing testes 
exhibited phenotypes with both germline and somatic defects (Fig- 
ures 5E-5H, 5M-5P, S4B, S4E, and S5B). First, GSCs expressing 
the H3T3A transgene were not maintained properly. In testes 
without transgene or expressing H3-GFP, only germ cells with 
dotted spectrosome structure (de Cuevas and Spradling, 1998; 
Hime et al., 1996; Lin et al., 1994) were detectable next to the 
hub cells (Figure S4A, arrows). However, in nos>H3T3A testes, 
germ cells with branched fusome structure were detected adja- 
cent to the hub region (arrowheads in Figure S4B), suggesting 
that GSCs either undergo precocious differentiation or cell death, 
thereby allowing more differentiated spermatogonial cysts to take 
their place. Quantification of these two distinct cellular structures 
(spectrosome versus fusome) showed a significant loss of GSCs 
in H3T3A-expressing testes (Figure S4C). Second, we observed 
a significant expansion of germline tumors carrying early-stage 
cellular markers, including nos-driven GFP expression (Figures 



5E, 5F, S4E, and S5B), spectrosome structure (Figures 5E, 5G, 
S4E, S5B, and S5D), and condensed nuclei (Chen et al., 2013; 
Schulz et al., 2004; Tran et al., 2000) (Figures 5H and S5B). Interest- 
ingly, based on these cellular markers, the tumors of progenitor 
germ cells developed in nos>H3T3A testes were noticeably het- 
erogeneous (Figure S5D). For example, some tumor cells main- 
tained strong GFP expression (Figure S5D), a mark indicative of 
active nos-Ga/4 activity, and exhibited spectrosome structure (Fig- 
ure S5D), suggesting that they are an early-stage GSC and/or GB 
cell tumor. Conversely, other tumor cells exhibited loss of GFP 
expression and a fusome structure (Figure S5D), suggesting that 
they are a later-stage spermatogonial tumor. We reason that this 
heterogeneity in tumor types is likely due to the heterogeneity 
observed in histone inheritance patterns (Figures 3 and 4). Third, 
the nos>H3T3A males had gradually decreased fertility (Fig- 
ure S5C), consistent with the progression of germline defects (Fig- 
ure S5B) and eventual germ cell loss (Figures 5M-5P and 5U). 
While the progenitor germ cell tumor phenotype was not detected 
in nos>H3 (n = 19) control testes, it was observed in 42.9% of 
nos>H3T3A testes (n = 42) (Figure 5U). The germ cell loss pheno- 
type was detected in 1 5.8% o^nos>H3 (n = 1 9) control testes but in 
47.6% oi nos>H3T3A testes (n = 42) (Figure 5U). The loss of germ 
cells in 15.8% of control testes is likely due to age-related effect 
(Boyle et al., 2007; Cheng et al., 2008; Toledano et al., 2012; Wal- 
lenfang et al., 2006). Last, nos>H3T3A testes (Figures 5G, inset, 
and S4B, yellow outline) showed a substantial hub enlargement 
(Figure 5V) compared to nos>H3 testes (Figures 5C, inset, and 
S4A, yellow outline), most likely as a secondary defect due to 
GSC loss as reported previously (Dinardo et al., 2011; Gonczy 
and DiNardo, 1 996; Monk et al., 201 0; Tazuke et al., 2002). In sum- 
mary, development of these germline defects in adult flies sug- 
gests that H3T3P is likely required for both GSC maintenance 
and proper differentiation of GB. 

Expression of H3T3A in Late-Stage Germ Ceiis or 
Somatic Ceiis Does Not Cause Germline Tumors 

The GSC loss, germline tumor and hub enlargement phenotypes 
in nos>H3T3A testes were specifically caused by expressing 
H3T3A in early-stage germ cells. We used a later-stage germline 
driver, bam-Gal4 (Cheng et al., 2008; Eun et al., 2014; Schulz 
et al., 2004) (Figure 6A), to turn on the same H3T3A transgene 
in four-cell and later stage germ cells. In doing so, we were 
able to effectively reduce H3T3P in the more differentiated 
germ cells (Figure 6G). However, in this population of symmetri- 
cally dividing cells, we did not detect the phenotypes (Figures 
6J-6M) we had observed in nos>H3T3A testes (Figures 5, S4, 
and S5). 

In addition to GSCs, another type of adult stem cell residing in 
the Drosophila testis niche is the cyst stem cell (CySC), which, 
under normal conditions, is the only mitotically active somatic 



(M-T) Immunostaining using a germ cell-specific anti-Vasa in testes from nos>H3T3A-GFP (M-P) or nos>H3T3D-GFP (Q-T) males. Both germ cell loss (M-T) and 
germline tumors (white dotted outline in Q-T) are detectable. Hoechst stains nuclei in (D), (H), (L), (P), and (T). Scale bars, 20 |am. 

(U) Quantification of the percentage of testes with germline tumor and/or germ cell loss in testes expressing nos>H3-GFP (n = 19), nos>H3T3A-GFP (n = 42), or 
nos>H3T3D-GFP (n = 43). 

(V) Quantification of hub size: 1 08 ± 2.393 |im^ in nos>H3-GFP (n = 50) testes versus 1 98.5 + 1 5.22 |im^ in nos>H3T3A-GFP testes (n = 37) (***p < 1 0“"^) or 1 45.2 ± 
9.702 i^m^ in nos>H3T3D-GFP testes (n = 37) (***p < 10“"^). All ratios = Avg ± SE; p value calculated by unpaired t test. 

See also Figures S4 and S5. 
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Figure 6. Expression of H3T3A or H3T3D Using the bam-Gal4 Driver Did Not Phenocopy Defects in nos>H3T3A or nos>H3T3D Testes 

(A) A cartoon showing stage-specificity of nos-Ga/4 and jbam-Ga/4 drivers: nos-Ga/4 is turned on in eariy-stage germiine, inciuding GSCs (Van Dorenetal., 1998), 
whiie bam-Gal4 expresses from four-ceii spermatogoniai ceiis (Cheng et ai., 2008; Eun et ai., 2014; Schulz et al., 2004). 

(B-l) Immunostaining using antibodies against the germ cell-specific marker Vasa, H3T3P, and H3S10P in ba/77>/-/3T3A-GFP testes. Expression of bam>H3T3A 
greatly reduces H3T3P in later stage mitotic spermatogoniai cells: a two-cell mitotic spermatogoniai cyst (white dotted outline in B-E) without H3T3A-GFP (B) had 
detectable H3T3P (C) and H3S10P (D), both H3T3P and H3S10P overlapped with DNA signal stained with Hoechst (E). By contrast, a four-cell mitotic sper- 
matogoniai cyst (white dotted outline in F-l) with H3T3A-GFP (F) had greatly reduced H3T3P (G) but abundant H3S10P (H), the H3S10P signal overlapped with 
DNA signal stained with Hoechst (I). The diffusive signal in (C) and (G) came from anti-Vasa, which stains the entire mitotic germ cells because their nuclear 
envelopes are broken down (Yadlapalli et al., 2011; Yuan et al., 2012). Scale bars, 10 ^im. 

(J-Q) Immunostaining using antibodies against a hub marker Faslll, spectrosome/fusome marker a-spectrin and Vasa: tip of the testis expressing bam>H3T3A- 
GFP (J-M) or bam>H3T3D-GFP (N-Q). Scale bars, 20 |im. Asterisks in (B-K), (N), and (O), hub. 

See also Figure S6. 
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gonadal cell type (Dinardo et al., 2011). When we used a somatic 
cell-specific T]-Gal4 driver (Tanentzapf et al., 2007) to express 
H3T3A, we found it is sufficient to reduce H3T3P signal specif- 
ically in CySCs (Figure S6A). However, no dramatic cellular de- 
fects could be detected when comparing TJ>H3T3A (Figure S6C) 
with TJ>H3 testes (Figure S6B). In summary, these stage-specific 
and cell type-specific effects caused by H3T3A expression sug- 
gest that the phenotype we observed in nos>H3T3A testes is 
unlikely the result of a global perturbation of general cellular 
machineries. 

Expression of H3T3D in Early-, but Not Late-Stage, Germ 
Cells Leads to Randomized H3 Inheritance and Cellular 
Defects 

To further understand how H3T3P functions in GSCs, we ex- 
pressed a different H3T3 mutant for which the T3 residue was 
converted to the phosphomimetic aspartic acid (D), under the 
hypothesis that such a mutation may disrupt the temporal order 
of H3T3 phosphorylation (Figures 1I-1N and S1D). Indeed, 
expression of H3T3D in early germ cells using a similar dual-color 
labeling strategy (as described for H3T3A in Figure 3A) also ran- 
domizes pre-existing H3T3D and newly synthesized H3T3D 
inheritance patterns (Figures 4J, 4K, and S3B; Table S1 ): approx- 
imately 79.0% (45/57) of GSC-GB pairs showed symmetric 
inheritance patterns, 7.0% (4/57) showed conventional asym- 
metry, and 10.5% (6/57) showed inverted asymmetry, with the 
remaining 3.5% (2/57) of pairs at the borderline between asym- 
metry and symmetry (1 .45- to 1 .55-fold). The randomized H3T3D 
inheritance patterns cannot be attributed to loss of H3T3P, as 
H3T3P is still detectable in H3T3D-expressing GSCs (Fig- 
ure S3C). These data suggest that it is likely the timing of the 
H3T3 phosphorylation that is important for normal GSC activity. 

In addition, both progenitor germline tumor (Figures 5I-5L and 
S4F) and germ cell loss (Figures 5Q-5T) phenotypes could be 
detected in nos-H3T3D testes (Figure 5U). Quantification 
showed significant decrease of GSCs in nos>H3T3D testes 
(6.84 ± 0.41 , n = 37) compared to that of the control nos>H3 
testes (8.68 ± 0.31 , n = 19; p < 0.001). Moreover, similar to the 
nos>H3T3A testes, the hub region in nos>H3T3D testes was 
also enlarged compared to the control nos>H3 testes (Figures 
5V and S4F), most likely as a secondary effect due to the loss 
of GSCs. By contrast, no germline tumor phenotype was found 
when the same transgene H3T3D-GFP was driven by the bam- 
Gal4 driver (Figures 6N-6Q). 

Since both reduction of H3T3P by expression of H3T3A and 
the mimicking of H3T3P by expression of H3T3D result in similar 
histone inheritance and germline defects, we hypothesize that 
phosphorylation of H3T3 might require a tight temporal control 
during GSC mitosis. Therefore, expressing either the H3T3A or 
the H3T3D may lead to loss of this control and similar defects 
in histone inheritance patterns as well as abnormal germline 
activity. 

Differential Effects of haspin Gene Mutations on 
Germline Tumor Phenotypes in H3T3A- and 
H3T3D-Expressing Testes 

The kinase that generates the H3T3P mark has been identified to 
be the Haspin protein (Dai et al., 2005). By driving a short hairpin 



RNA (shRNA) (Ni et al., 2011) with the nos-Gal4 driver to knock 
down haspin, specifically in early-stage germ cells, we were 
able to observe a significant decrease of H3T3P in GSCs (Fig- 
ure S7A). Testes expressing nos>haspin shRNA showed a 
much greater frequency of cell death (Figures S7C and S7D) 
confined mainly to spermatogonial cells (Yacobi-Sharon et al., 
2013), when compared to the nos-Gal4 control (Figure S7B). 
Even though spermatogonial cell death was also detected in 
nos>H3T3A testes (and in bam>H3T3A testes), germline tumor 
phenotype was much more prevalent in nos>H3T3A testes 
than in nos>haspin shRNA testes. The similarity between 
nos>H3T3A and nos>haspin shRNA phenotypes is consistent 
with the fact that both lead to reduced H3T3 phosphorylation. 
The difference between nos>H3T3A and nos>haspin shRNA 
phenotypes suggests that the phenotypes induced by H3T3A 
expression are not simply a byproduct of compromising Haspin 
kinase activity in general. It is likely that Haspin targets some, as 
of yet, unknown substrates other than H3T3 in Drosophila GSCs. 
For instance, the yeast Haspin homolog has been shown to have 
potential roles in regulating mitotic spindle polarity (Panigada 
et al., 2013). It has also been reported that knockdown of Haspin 
in human cells (Wang et al., 2010; Yamagishi et al., 2010; re- 
viewed by Higgins, 201 0) or in Xenopus (Kelly et al., 201 0) results 
in mitotic spindle defects. 

To further understand potential interactions between Haspin 
and loss-of-H3T3P phenotypes, we first asked whether halving 
the level of Haspin could enhance the nos>H3T3A phenotype. 
For this, we utilized a set of permissive conditions described 
hereafter to create a sensitized genetic background. Due to the 
temperature sensitivity of the Gal4:UAS system, flies grown at 
lower temperature (i.e., 18°C) have been shown to have reduced 
levels of Gal4-driven expression (Eliazer et al., 2011). In testis 
samples from nos>H3T3A flies grown at 1 8°C at an earlier devel- 
opmental stage (O*"^ instar larvae), we found that H3T3P is still 
abundant and cellular defects were minimal. For example, no 
obvious germline tumor was detected (n = 18, Figures 7A-7D). 
Therefore, we utilized these conditions as a permissive but sensi- 
tized genetic background. In this background, if Haspin level was 
halved (using a deficiency chromosome that uncovers the haspin 
gene locus; Figures 7E-7H), increased germ cell tumors could be 
detected (56%, n = 1 9, Figure 7M). These tumors were identified 
using a variety of morphological features, including expansion of 
germ cells with r?os-Ga/4-d riving GFP expression (Figure 7F 
versus 7B), spectrosome structure (Figure 7G versus 7C), and 
condensed nuclei (Figure 7H versus 7D). Enhancement of the 
germline tumor phenotype in nos>H3T3A testes was also de- 
tected using a hypomorphic haspin"^'^^^^^ allele (Venken et al., 
2011), although with less severity (Figures 7I-7L) and lower pene- 
trance (21 %, n = 1 6, Figure 7M). In summary, these data showed 
that in nos>H3T3A testes the germline tumor phenotype could 
be enhanced by loss-of-function in haspin gene. 

We next explored the genetic interaction between haspin and 
nos>H3T3D phenotype by utilizing a set of restrictive condi- 
tions— flies were grown at 18°C, shifted to 29°C as newly 
eclosed flies and kept at 29°C for 7 days, under which 
nos>H3T3D testes showed strong phenotype with high pene- 
trance. We found that when Haspin level was halved using the 
same deficiency chromosome that uncovers the haspin gene 
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Figure 7. Genetic Interactions between haspin Gene Mutants and Mutations of H3T3 

(A-L) Immunostaining using antibodies against a hub marker Fasiii, spectrosome/fusome marker a-spectrin and germ ceii marker Vasa in iarvai testes from 
nos>H3T3A (A-D), Df(haspin)/+-, nos>H3T3A (E-H) or/7asp/n™°^^®®/+; nos>H3T3A (i-L) maies at constant 18°C. Eariy-stage germiine tumor is detected in testes 
from Df (haspin)/ +\ nos>H3T3A (F) and (H) or /7asp/n™°®^®®/+; nos>H3T3A (J) and (L) maies, but not in testes from nos>H3T3A (B) and (D) maies. Arrowhead in (G) 
points to eniarged hub area compared to (C). 

(M) Percentage of testes that are normai or have germiine tumor(s) from maies of the foiiowing genotypes: nos>H3T3A (n = 18); Df (haspin)/+] nos>H3T3A (n = 16); 
and /7asp/n™°®^®®/+; nos>H3T3A (n = 19). 

(N-U) Immunostaining using antibodies against Faslll, a-spectrin, and Vasa in testes from nos>H3T3D (N-Q) ox Df (haspin)/ +\ nos>H3T3D (R-U) males (siblings 
from the same crosses) grown at 18°C, shifted to 29°C as newly eclosed flies and kept at 29°C for 7 days. Eariy-stage germiine tumor is detected in testes from 
nos>H3T3D males (O) and (Q), but less severe in testes from nos>H3T3D; Df(haspin)/+ males (S) and (U). Arrowhead in (P) points to enlarged hub area compared 
to (T). Eariy-stage germ cells, as determined by nos-driven GFP expression (B), (F), (J), (O), and (S), and nuclear morphology are delineated by the yellow dotted 
lines in (D), (H), (L), (Q), and (U). 

(V) Percentage of testes that have germiine tumor(s) or germ cell loss from males with the following genotypes: nos>H3T3D (n = 1 7) or Df (haspin)/ +\ nos>H3T3D 
{n = ^2). 

(W) Quantification of hub size: 172.2 ± 14.72 |am^ in nos>H3T3D (n = 17) testes versus 1 15.0 ± 9.802 |am^ in Df (haspin)/+] nos>H3T3D (n = 12) testes (all ratios = 
Avg ± SE; *p < 0.005, calculated by unpaired t test). Asterisks in (A), (E), (I), (N), and (R), hub. Scale bars, 20 ^im. 

See also Figure S7. 



locus, both germiine tumor and germ cell loss phenotypes in 
nos>H3T3D testes were suppressed, as indicated by lower 
severity (compare Figures 7N-7Q with Figures 7R-7U) and 
reduced penetrance (Figure 7V). Consistently, the secondary 
hub enlargement defect in nos>H3T3D testes was also sup- 
pressed (Figure 7W). These findings are reminiscent of published 



studies in which expression of a phosphomimetic substrate can 
rescue the phenotypes of compromised kinase activity in cancer 
cells (Wu et al., 2010). Together, the opposite genetic interac- 
tions between haspin and the two H3 mutations on T3 further 
support the hypothesis that H3T3P needs to be tightly controlled 
for proper H3 inheritance and germiine activity. 
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DISCUSSION 

Here, we report that a mitosis-enriched H3T3P mark acts as a 
transient landmark that distinguishes sister chromatids with 
identical genetic code but different epigenetic information, 
shown as pre-existing H3-GFP and newly synthesized H3- 
mKO. By distinguishing sister chromatids containing different 
epigenetic information, H3T3P functions to allow these molecu- 
larly distinct sisters to be segregated and inherited differentially 
to the two daughter cells derived from one asymmetric cell divi- 
sion. The selective segregation of different populations of his- 
tones likely allows these two cells to assume distinct fates: 
self-renewal versus differentiation. Consequently, loss of proper 
epigenetic inheritance might lead to defects in both GSC main- 
tenance and GB differentiation, suggesting that both cells need 
this active partitioning process to either “remember” or “reset” 
their molecular properties. 

The temporal and spatial specificities of H3T3P make it a great 
candidate to regulate asymmetric sister chromatid segregation. 
First, H3T3P is only detectable from prophase to metaphase, the 
window of time during which the mitotic spindle actively tries to 
attach to chromatids through microtubule-kinetochore interac- 
tions. Second, the H3T3P signal is enriched at the peri-centro- 
meric region, where kinetochore components robustly crosstalk 
with chromatin-associate factors. Third, H3T3 shows a sequen- 
tial order of phosphorylation, first appearing primarily on sister 
chromatids enriched with pre-existing H3 and then subsequently 
appearing on sister chromatids enriched with newly synthesized 
H3 as the GSC nears metaphase. The distinct temporal patterns 
shown by H3T3P are unique to GSCs and would allow the 
mitotic machinery to differentially recognize sister chromatids 
bearing distinct epigenetic information; an essential step neces- 
sary for proper segregation during asymmetric GSC division. 
Furthermore, the tight temporal control of H3T3 phosphorylation 
suggests that rather than serving as an inherited epigenetic 
signature, H3T3P may act as transient signaling mark to allow 
for the proper partitioning of H3. We hypothesize that H3T3P 
needs to be under tight temporal control in order to ensure 
proper H3 inheritance and germline activity. 

Our studies have shown that H3T3P is indeed subject to strin- 
gent temporal controls during mitosis. The H3T3P mark is unde- 
tectable during G2 phase. Upon entry to mitosis, sister chroma- 
tids enriched with pre-existing H3-GFP histone begin to show 
H3T3 phosphorylation prior to sister chromatids enriched with 
newly synthesized H3-mKO. As the cell continues to progress to- 
ward metaphase, H3T3P signal begins to appear on sister chro- 
matids enriched with newly synthesized H3-mKO. Such a tight 
regulation of H3T3P is compromised when levels of H3T3P are 
altered due to the incorporation of mutant H3T3A or H3T3D. 
Incorporation of the H3T3A mutant results in a significant 
decrease in the levels of H3T3P on sister chromatids throughout 
mitosis, such that neither sister becomes enriched with H3T3P 
as the GSC progresses toward metaphase. Conversely, incorpo- 
ration of the H3T3D mutant would result in seemingly elevated 
levels of H3T3P early in mitosis. Although H3T3A and H3T3D 
act in different ways, both mutations significantly disrupt the 
highly regulated temporal patterns associated with H3T3 phos- 
phorylation, the result of which is randomized H3 inheritance 



patterns and germ cell defects in testes expressing either 
H3T3A or H3T3D. 

To further evaluate the extent of H3T3A and H3T3D roles in the 
segregation of sister chromatids enriched with different popula- 
tions of H3 during mitosis (Figure 1 B, step two), we modeled all 
possible segregation patterns in male GSCs and compared 
these estimates to our experimental results. To simplify our cal- 
culations, we made two important assumptions: first, we assume 
nucleosomal density to be even throughout the genome. This 
assumption allows us to infer that the overall fluorescent signal 
contributed by each chromosome is proportional to their respec- 
tive number of DMA base pairs. Second, by quantifying pre-ex- 
isting H3-GFP asymmetry in anaphase and telophase GSCs, 
we estimate that the establishment of H3-GFP asymmetry is 
~4-fold biased, i.e., 80% on one set of sister chromatids and 
20% on the other set of sister chromatids, based on quantifica- 
tion of GFP signal in anaphase (GFP GSC side/GB side = 4.5) and 
telophase (GFP GSC side/GB side = 3.8) GSCs (Tran et al., 
2012). With these two simplifying assumptions, we calculate 
both GFP and mKO ratios among all 64 possible combinations 
(Table S2: 2 (for X-ch) x 2 (for Y-ch) x 4 (for 2""' ch) x 4 (for 3"^^ 
ch) = 64 combinations in total). If we define asymmetry as a 
greater than ~1. 5-fold difference in fluorescence intensity, then 
based on a model of randomized sister chromatid segregation, 
we estimate that a symmetric pattern should appear for 53.1 % 
(34/64) of GSC-GB pairs whereas both conventional and in- 
verted asymmetric patterns should occur with equal frequencies 
and account for 18.7% (12/64) of total GSC-GB pairs. The re- 
maining 9.4% (6/64) of GSC-GB pairs should produce histone 
inheritance patterns with a 1 .45- to 1 .55-fold difference in signal 
intensity (predicted ratios in Figure 4K). 

This estimation is close to our experimental data in both 
H3T3A- and H3T3D-expressing testes (Figures 4J and 4K; Table 
S1). Of the 64 quantified post-mitotic GSC-GB pairs in 
nos>H3T3A testes, ~71.9% showed symmetric inheritance 
pattern. Conventional and inverted asymmetric patterns were 
detected at 9.4% and 12.5%, respectively, and 6.3% at the 
borderline. Similarly, of the 57 quantified post-mitotic GSC-GB 
pairs in nos>H3T3D testes, ~79.0% showed symmetric inheri- 
tance pattern. Conventional and inverted asymmetric patterns 
were detected at 7.0% and 10.5%, respectively with 3.5% of 
pairs at the borderline. Some differences between predicted ra- 
tios and our experimental data could be due to the simplified as- 
sumptions, the limited sensitivity of our measurement, and/or 
some coordinated chromatid segregation modes that bias the 
eventual read-out (Yadlapalli and Yamashita, 201 3). In summary, 
comparison between the modeling ratios and our experimental 
data suggest that loss of the tight control of H3T3 phosphoryla- 
tion in GSCs randomizes segregation of sister chromatids en- 
riched with different populations of H3. 

If the temporal separation in the phosphorylation of H3T3 on 
epigenetically distinct sister chromatids facilitates their proper 
segregation and inheritance during asymmetric cell division, it 
is likely that mutations of the Haspin kinase will also affect the 
temporal control of H3T3 phosphorylation. In the context of 
H3T3A, where the levels of H3T3P are already reduced, a further 
decrease in H3T3P by reducing Haspin levels should limit the 
GSC’s ability to distinguish between sister chromatids enriched 
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with distinct H3. Indeed, haspin mutants enhance the pheno- 
types in nos>H3T3A testes. A different situation appears in the 
context of H3T3D where sister chromatids experience seemingly 
elevated levels of H3T3P at the start of mitosis. These elevated 
H3T3P levels may be exacerbated by the phosphorylation activ- 
ity of the Haspin kinase. Therefore, it is conceivable that by 
halving the levels of the Haspin kinase, H3T3 phosphorylation 
should be reduced to a level more closely resembling wild- 
type. In this way, some of the temporal specificity that is lost in 
the H3T3D mutant is restored, resulting in suppression of the 
phenotypes observed in nos>H3T3D testes. An exciting topic 
for future study would be to further explore how exactly Haspin 
phosphorylates H3T3 in the context of chromatin and whether 
H3T3A and H3T3D mutations act synergistically or antagonisti- 
cally in regulating asymmetric sister chromatids segregation 
through differential phosphorylation of a key histone residue. 

It would also be interesting to understand the potential connec- 
tion between asymmetric histone inheritance and another phe- 
nomenon reported by several investigators: selective DNA strand 
segregation (reviewed by Evano and Tajbakhsh, 2013; Rando, 
2007; Tajbakhsh and Gonzalez, 2009). Recent development of 
the chromosome orientation fluorescence in situ hybridization 
(CO-FISH) technique (Falconer et al., 2010) allows study of 
selective chromatid segregation at single-chromosome resolu- 
tion. Using this technique in mouse satellite cells, it has been 
demonstrated that all chromosomes are segregated in a biased 
manner, such that pre-existing template DNA strands are pre- 
ferentially retained in the daughter cell that retains stem cell 
identity. Interestingly, this biased segregation becomes random- 
ized in progenitor non-stem cells (Rocheteau et al., 2012). Using 
CO-FISH in Drosophila male GSCs, sex chromosomes have 
been shown to segregate in a biased manner. Remarkably, 
sister chromatids from homologous autosomes have been 
shown to co-segregate independent of any specific strand prefer- 
ence (Yadlapalli and Yamashita, 2013). Such findings hint at a 
possible epigenetic source guiding the coordinated inheritance 
of Drosophila homologous autosomes. In many cases of biased 
inheritance, researchers have speculated about the existence of 
a molecular signature that would allow the cell to recognize 
and segregate sister chromatids bearing differential epigenetic 
information (Klar, 1994, 2007; Lansdorp, 2007; Rando, 2007; Yen- 
nek and Tajbakhsh, 201 3). However, the identity of such a signa- 
ture has remained elusive. The work represented in this paper 
provides experimental evidence demonstrating that a tightly- 
controlled histone modification, H3T3P, is able to distinguish sis- 
ter chromatids and coordinate their segregation. 

Epigenetic processes play important roles in regulating stem 
cell identity and activity. Failure to appropriately regulate epige- 
netic information may lead to abnormalities in stem cell behav- 
iors, which underlie early progress toward diseases such as 
cancer and tissue degeneration. Due to the crucial role that 
such processes play in regulating cell identity and behavior, 
the field has long sought to understand whether and how stem 
cells maintain their epigenetic memory through many cell divi- 
sions. Our results here suggest that the asymmetric segregation 
of pre-existing and newly synthesized H3-enriched chromo- 
somes may function to determine distinct cell fates of GSCs 
versus differentiating daughter cells. 



EXPERIMENTAL PROCEDURES 
Heat-Shock Scheme 

Flies with UASp-FRT-H3-GFP-PolyA-FRT-H3- mKO or UASp-FRT-H3T3A/D- 
GFP-PolyA-FRT-H3T3A/D- mKO mutant transgene were paired with nos-Gal4 
drivers. Flies were raised at 18°C throughout development until adulthood to 
avoid pre-flip (Tran et al., 2012). Before heat shock, 0- to 3-day-old males 
were transferred to vials that had been air-dried for 24 hr. Vials were sub- 
merged underneath water up to the plug in a circulating 37°C water bath for 
2 hr and recovered in a 29°C incubator for indicated time before dissection 
and immunostaining experiments. 

Temperature Shift Assay to Induce Germline Tumor in Aduit Flies 

Flies with UASp-FRT-H3-GFP-PolyA-FPT-H3- mKO or UASp-FPT-H3T3A/D- 
GFP-PolyA-FRT-H3T3A/D- mKO paired with nos-Gal4, bam-Gal4, or Tj-Gal4 
driver were raised at 1 8°C throughout development until adulthood. Newly en- 
closed males were collected and shifted to 29°C for indicated time before 
dissection and immunostaining experiments. 

Immunostaining Experiments 

Immunofluorescence staining was performed using standard procedures 
(Hime et al., 1996; Tran et al., 2012). Primary antibodies were mouse anti-a 
spectrin (1:50, DSHB 3A9), mouse anti-Fas III (1:50, DSHB, 7G10), mouse 
anti-Armadillo (1:100; DSHB, N2 7A1 clone), rabbit anti- H3T3P (1:200, Milli- 
pore 05-746R), mouse anti-H3S10P (1:2,000; Millipore, #05-806), chicken 
anti-CID (1:100; gift from Dr. Sylvia Erhardt, University of Heidelberg, Ger- 
many), and rabbit anti-Vasa (1:200; Santa Cruz SC-30210). Secondary anti- 
bodies were the Alexa Fluor-conjugated series (1 :200; Molecular Probes). Ly- 
sotracker (Invitrogen L7528) is applied according to manufacturer 
recommendation. Images were taken using the Zeiss LSM 51 0 META or Zeiss 
LSM 700 Multiphoton confocal microscope with a 40x or 63x oil immersion 
objectives and processed using Adobe Photoshop software. 

EdU Incorporation to Label GSC-GB Pair at S Phase 

EdU labeling of the GSC-GB pairs at S phase was performed using Click-iT EdU 
Alexa Fluor 647 Imaging Kit (Life Science Cl 0640) according to manufacturer’s 
instructions. Dissected testes were immediately incubated in S2 medium with 
100 laM EdU for 30 min at room temperature. The testes were subsequently 
fixed and proceed to primary antibodies (anti-Faslll, anti-a spectrin and anti- 
Vasa) incubation. Fluorophore conjugation to EdU was performed along manu- 
facturer’s instructions and followed by secondary antibodies incubation. 

The addition of EdU facilitates recognition of the GSC-GB pairs undergoing 
active DNA synthesis from those without EdU, which might be arrested due to 
the heat-shock treatment. The cell-cycle progression is important for the 
incorporation and segregation of pre-existing versus newly synthesized H3. 

Quantification of GFP and mKO Intensity 

No antibody was added to enhance either GFP or mKO signal. Values of GFP 
and mKO intensity were calculated using Image J software. DAPI signal was 
used to determine the area of nucleus for measuring both GFP and mKO fluo- 
rescent signals, the raw reading was subsequently adjusted by subtracting 
fluorescence signals in the hub region used as background in both GSC and 
GB nuclei and compared between each other. 
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SUMMARY 

Chemical cross-linking and DNA sequencing have 
revealed regions of intra-chromosomal interaction, 
referred to as topologically associating domains 
(TADs), interspersed with regions of little or no inter- 
action, in interphase nuclei. We find that TADs and 
the regions between them correspond with the 
bands and interbands of polytene chromosomes of 
Drosophila. We further establish the conservation 
of TADs between polytene and diploid cells of 
Drosophila. From direct measurements on light mi- 
crographs of polytene chromosomes, we then 
deduce the states of chromatin folding in the diploid 
cell nucleus. Two states of folding, fully extended fi- 
bers containing regulatory regions and promoters, 
and fibers condensed up to 10-fold containing cod- 
ing regions of active genes, constitute the euchro- 
matin of the nuclear interior. Chromatin fibers 
condensed up to 30-fold, containing coding regions 
of inactive genes, represent the heterochromatin of 
the nuclear periphery. A convergence of molecular 
analysis with direct observation thus reveals the ar- 
chitecture of interphase chromosomes. 

INTRODUCTION 

The basis of DNA folding and compaction in nuclei and chromo- 
somes is one of the great mysteries of biology. How are two 
meters of DNA packaged in an interphase nucleus ~10 |im in 
diameter? At the molecular level, folding begins with wrapping 
of DNA around histones in the nucleosome (Kornberg, 1974; 
Luger et al., 1997). At a cytological level, chromatin appears as 
darkly staining heterochromatin, most abundant at the periphery 
of the interphase nucleus, and lightly staining euchromatin, in the 
nuclear interior. The folding of chromatin fibers in heterochromat- 
in and their organization in euchromatin have not been deter- 
mined. We have gained insight into chromosome condensation 
from molecular analysis of polytene chromosomes of Drosophila. 

Polytene chromosomes occur in cells of dipteran larvae that 
undergo as many as ten rounds of DNA replication without divi- 
sion, while retaining close alignment of sister chromatids and 
pairing of homologous chromosomes (Urata et al., 1995). Poly- 
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tene chromosomes are visible by light microscopy (Agard and 
Sedat, 1983; Urata et al., 1995) and display an alternation of 
dense bands with less dense interbands (Balbiani, 1881; Flem- 
ming, 1882), which has been exploited for genetic analysis 
(Bridges, 1935). The increased density in bands reflects elevated 
DNA content and a stable state of chromatin condensation 
(Beermann, 1 972). Two types of bands have been distinguished, 
loosely compacted gray bands and dense, tightly compacted 
intercalary heterochromatin (IH) bands (Vatolina et al., 201 1 ; Zhi- 
mulev et al., 2014). Median expression levels of genes within 
gray bands are 27 times greater than that of genes within IH 
bands (Zhimulev et al., 2014). 

Chromatin folding has been detected at the molecular level, by 
“chromosome conformation capture,” in which the chemical 
cross-linking of chromosomal material is followed by fragmenta- 
tion, ligation, and DNA sequence analysis (Dekker et al., 2002). 
Extension of this approach by high-throughput sequencing 
(Hi-C) revealed, at a resolution of 1 Mb, a genome-wide interac- 
tion network of the human genome (Lieberman-Aiden et al., 
2009). Computational analysis of the Hi-C results identified two 
sets of chromosomal regions, termed compartments, within 
which very distant interactions occur more frequently than ex- 
pected for the random coil configuration of a polymer. The two 
compartments correlate with regions of transcriptional activity 
and inactivity. 

Increased sequencing depth revealed finer details of chro- 
mosome folding, at a resolution of 40 kb or better. So-called 
topologically associating domains (TADs) (Nora et al., 2012), 
also referred to as “physical domains” (Hou et al., 2012; Sexton 
et al., 2012) or “topological domains” (Dixon et al., 2012), in 
which nucleotide sequences far apart along the DNA come in 
close proximity to one another are observed at a length scale 
of a few Mb pairs or less. Hi-C also revealed loops, frequently 
bridging between enhancers and promoters, that correlate with 
gene activation. Loops are apparently transient, >250 nm apart 
in three-quarters of a cell population (Rao et al., 2014), and 
although important for gene regulation, are unlikely to provide 
a structural basis for heterochromatin and euchromatin. TADs, 
however, represent a consistent feature among cell types (Dixon 
et al., 2012, 2015) and may therefore relate to stably folded 
states of chromatin. In studies performed to date, however, 
TADs have only been revealed by DNA sequencing and not 
directly related to chromosome condensation. A hierarchical 
relationship between TADs and compartments has been sug- 
gested by computational models (Gibcus and Dekker, 2013; 
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Figure 1. Lack of Regular, Long-Range Contacts in Polytene Chromosomes 

(A) Genome-wide Hi-C heatmap from polytene cells. Black circles and squares represent where centromeres and telomeres intersect, respectively. 

(B) Hi-C heatmap from Drosophila embryos (Sexton et al., 2012) of a 17 Mb region of chromosome 3R encompassing the ANT-C and BX-C loci. Black circles 
represent where the ANT-C and BX-C loci intersect. 

(C) Hi-C heatmap from polytene cells of a 1 7 Mb region of chromosome 3R encompassing the ANT -C and BX-C loci. Black circles represent where the ANT-C and 
BX-C loci intersect. 

Heatmaps were normalized and divided into 100 kb bins. See also Figures S1 and S2. 



Sexton et al., 2012). We report here on the direct visualization of 
TADs in polytene chromosomes of Drosophila, from which the 
relationship between TADs and compartments, as well as struc- 
tural correlates of chromatin condensation in the interphase nu- 
cleus, may be derived. 

RESULTS 

Hi-C on Drosophila Polytene Chromosomes 

We performed Hi-C on the salivary glands of wandering third 
instar larvae of Drosophila melanogaster (Figures 1A, 1C, SI, 
and S2). The limited amount of material available from manually 
dissected, primary polytene tissue necessitated a Hi-C 
approach with improved signal-to-noise and limited the resolu- 
tion of our analysis to 15 kb (Figure S2; Supplemental Experi- 
mental Procedures). To make reliable comparisons between 
the polytene Hi-C data, other Hi-C datasets, and cytological ob- 
servations, we only considered chromosomal features of 75 kb 



or larger. The highly underreplicated and repetitive nature of peri- 
centromeric heterochromatin also precluded reliable Hi-C anal- 
ysis in these regions. Differences in copy number across the 
arms of polytene chromosomes due to incomplete DNA replica- 
tion do not affect Hi-C results (Figure S3; Supplemental Experi- 
mental Procedures). 

Our genome-wide, polytene Hi-C heatmaps revealed interac- 
tions between centromeres, but a lack of contact between 
telomeres (Figure 1A). These results are consistent with three- 
dimensional reconstructions of polytene chromosomes by light 
microscopy, in which centromeres were observed to cluster at 
the chromocenter at one pole of the nucleus with telomeres 
spread in the opposite hemisphere in a RabI orientation (Hoch- 
strasser et al., 1986). Centromeres are also clustered together 
in diploid Drosophila Kc167 cultured cells (Hou et al., 2012) 
and late-stage embryos (Sexton et al., 2012). Telomeres do not 
interact with one another in diploid cells (Hou et al., 2012), but 
do interact in embryos (Sexton et al., 2012). 
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Distant or long-range interactions (separated by >1 Mb), such 
as those previously observed between the ANT-C and BX-C loci 
in Hi-C heatmaps of late-stage (mixed cell-type) Drosophila 
embryos (Sexton et al., 2012) (Figure IB) were not detected in 
our polytene Hi-C heatmaps (Figure 1C). The lack of long-range 
interactions in our heatmaps was consistent with previously 
described light microscopy of polytene chromosomes in situ, 
which revealed only regular, short-range contacts and no repro- 
ducible, long-range interactions (Hochstrasser et al., 1986). 
Occasionally polytene bands interact with one another, result- 
ing in ectopic pairing. The frequency of cells exhibiting ectopic 
pairing is low (Zhimulev et al., 1982) and complete mixing of 
bands is not observed, which explains why, at present, such 
pairing falls below our limit of detection by Hi-C. 

We examined the proportion of paired-end reads that mapped 
to the same restriction fragment and pointed in the same direc- 
tion, reflecting interactions between paired and aligned chroma- 
tids or homologous chromosomes (Sexton et al., 201 2). A similar 
proportion of paired-end reads fell into this class for our polytene 
chromosome dataset (0.0328%) and the reported embryonic da- 
taset (0.0553%). These values are much less than the proportion 
of paired-end reads representing intramolecular self-ligation 
(cyclization) events (0.1 1 7% and 0.1 32% in the polytene and em- 
bryo datasets, respectively) indicating that reads dues to interac- 
tions between paired and aligned chromatids or homologous 
chromosomes are very infrequent. A higher proportion of reads 
in the reported diploid chromosome (Kc167 cell line) dataset fall 
in this class (22.5% compared to 8.35% of paired-end reads rep- 
resenting intramolecular self-ligation events), but the chromatin 
was fragmented to a median size of 2,377 bp, compared with 
1 93 bp in the polytene and embryonic experiments. Homologous 
chromosome pairing is prevalent in many Drosophila primary tis- 
sues and cultured cells (Fung et al., 1998; Williams et al., 2007), 
but the analysis of paired-end reads indicates that paired homo- 
logs or chromatids are not perfectly aligned at the level of a few 
hundred base pairs, even in polytene chromosomes, and the de- 
gree of alignment in polytene chromosomes is approximately the 
same as that in late-stage embryos. 

Equivalence of Polytene Bands with TADs 

Our polytene Hi-C heatmaps showed the presence of TADs, re- 
gions of high self-interaction, visible as boxes centered on the di- 
agonal (Figures 2A, 5A, and S3). Our data revealed 346 polytene 
TADs. The median TAD size was 1 65 kb; the mean TAD size was 



195 kb (Figure S3; Table SI). Superimposing the Hi-C heatmap 
onto the locations of polytene bands for which reliable DMA 
sequence coordinates have previously been determined by fluo- 
rescence in situ hybridization (FISH) and non-histone protein 
localization (Belyaeva et al., 2012; Vatolina et al., 2011) demon- 
strated a correspondence of TADs with bands (Figures 2A, 2B, 
and 5A). 

For a quantitative assessment of the interaction pattern within 
polytene bands, we computed the genome-wide directionality 
index, a measure of the degree of bias of a locus for interaction 
with downstream (positive directionality index) or upstream loci 
(negative directionality index) (Dixon et al., 2012). Characteristi- 
cally, the directionality index of a TAD starts positive, goes to 
zero at the middle of the TAD, and continues toward negative 
values at the end of a TAD (Dixon et al., 2012). Polytene bands 
exhibit this same trend, supporting their identification as TADs 
(Figure 2C). 

The concordance between polytene bands and TADs was 
observed across all of the large chromosome arms (Figure 2D). 
At least 95% of polytene bands corresponded to uninterrupted 
TADs. The overlap between polytene bands and TADs was far 
greater than expected on a random basis (Z score = 10.5, p = 
3.01 X 10“^^; Figure S4B; Supplemental Experimental Proce- 
dures). Differences were observed almost exclusively at band 
boundaries. Mismatches between polytene bands and TADs 
were far less than expected on a random basis (Z score = 
-10.5, p = 2.78 X 10“^^; Figure S4B; Supplemental Experi- 
mental Procedures). The majority (57.4%) of band boundaries 
were located within 20 kb of TAD boundaries (Figure 2E). Dis- 
crepancies at the boundaries were likely due to the limited reso- 
lution of the Hi-C experiment, limited accuracy of DMA sequence 
coordinates for polytene band borders, or both. Near identity of 
boundaries does not necessarily indicate equivalence of entire 
domains, but quantitative, unbiased examination of entire poly- 
tene bands revealed far greater equivalence with polytene 
TADs than expected on a random basis (Figure S5B; Supple- 
mental Experimental Procedures). 

Polytene Puffs Are Not TADs 

High transcriptional output of some genes at certain stages of 
development causes the conversion of polytene bands to puffs, 
regions of decondensed chromatin with a diameter wider than 
the rest of the chromosome (Ashburner, 1967). Loci that are 
known to convert from bands to puffs at the wandering third 



Figure 2. Polytene Bands Are TADs 

(A) Normalized Hi-C heatmap (1 5 kb bins) of a 3 Mb region of polytene chromosome 2L. Left: areas bounded by black boxes represent locations of polytene bands 
for which reliable DNA sequence coordinates are available (Belyaeva et al., 2012; Vatolina et al., 2011). Right: areas bounded by black boxes represent TADs. 

(B) Photographic image (bottom) of the region of polytene chromosome 2L from (A) and the same region of Bridges’s chromosome map (top). Arrows indicate 
bands represented by black boxes in (A). The DNA sequence coordinates of other bands in this region are not known and therefore cannot be compared with the 
Hi-C data. Adapted from Lefevre (1 976). To prepare the panel, the original photographic image from Lefevre was digitally scanned and then cropped to display the 
region of the chromosome corresponding to that of the Hi-C analysis in (A). Black arrows were then superimposed onto the cropped image using Adobe Illustrator 
to mark the polytene bands represented by black boxes in (A). 

(C) Mean directionality index (Dl) of polytene bands (upper panel; n = 61) and heatmap ofthe directionality index of each band along its length (lower panel). Bands 
were normalized to the same length and 50 kb of flanking DNA is shown next to each normalized band. 

(D) Heatmap of the agreement between polytene bands (n = 61 ) and TADs. Each band is represented by a row. Bands were normalized to the same length and 50 
kb of flanking DNA is shown next to each normalized band. Orange segments overlap with polytene TADs, black segments overlap with regions between TADs. 

(E) Fraction of band boundaries (n = 122) at the distance indicated on the abscissa from the closest TAD boundary (calculated in 20 kb windows). 

See also Figures S3, S4, and S5 and Table SI . 
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Figure 3. Hi-C of Polytene Puffs 

(A) Normalized Hi-C heatmaps (15 kb bins) of puff 
stage five to eight polytene puffs on chromosome 
3L. Areas bounded by black boxes correspond to 
the indicated puff. 

(B) Mean directionality index (Dl) of polytene puffs 
(left) and heatmap of the directionality index of each 
puff along its length (right). Puffs were normalized 
to the same length and 50 kb of flanking DNA is 
shown next to each normalized puff. 
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instar larval stage showed no evidence of TADs, but rather 
strong signals restricted to the diagonal (Figure 3A). Further- 
more, quantitation of the interaction pattern of puffs showed a 
mostly uniform directionality index centered around zero, rather 
than a TAD-like bias in directionality (Figure 3B). The absence of 
TADs from puffed bands strengthens the connection between 
TADs and chromatin condensation. 

Hi-C Predicts the Location of Bands and Interbands 

As a further test of the relationship between TADs and bands, we 
asked whether the DNA sequence of a TAD could be used to pre- 
dict the location of a band for which reliable DNA sequence infor- 
mation was not available. We designed fluorescent probes to the 
centers and borders of three TADs and hybridized them individ- 
ually to polytene chromosomes (Figure 4). In every case, the 
FISH signal from a TAD center probe precisely overlapped with 
a polytene band. In five of six cases, the FISH signal from a 
TAD border probe overlapped with the adjacent polytene inter- 
band. In the one case where the TAD border probe overlapped 
with a band (Figure 4A; centromere proximal probe), the overlap 
occurred at the margin of the band immediately adjacent to a 
small interband (between bands 22A1-2 and 22A3 in Bridges’s 
map) (Bridges, 1935; Lefevre, 1976). Inasmuch as only 5% of 
DNA is located within interbands (Beermann, 1972), our success 
rate of 0.833 in identifying interbands is much greater than ex- 



pected (p = 1.80 X 10“®, binomial test). 
FISH thus confirms our Hi-C results and 
our identification of TADs with polytene 
bands. 

Conservation of TADs between 
Polytene and Diploid Cells 

Hi-C was previously performed on 
D. melanogaster diploid, Kc167 cultured 
cells (Hou et al., 2012), and late-stage em- 
bryos (Sexton et al., 2012). Both studies 
revealed TADs as an organizational 
feature of the Drosophila genome. The 
Hi-C heatmap from diploid cells and our 
Hi-C heatmap from polytene cells were 
closely similar (Figures 5A and 5B) and 
highly correlated (genome-wide Pear- 
son’s r = 0.793, p < 2.2 x 10“^®). Where 
comparisons could be made, diploid 
TADs could also be seen to correspond 
to polytene bands (Figures 5B and 5C). 
The overlap between polytene TADs and diploid TADs (Figure 5D) 
was far greater than expected on a random basis (Z score = 20.5, 
p = 3.92 X 10“®^; Figure S4B; Supplemental Experimental Pro- 
cedures). Similarly, mismatches between polytene bands and 
TADs were far less than expected on a random basis (Z score = 
-20.6, p = 6.89 X 10“®^; Figure S4B; Supplemental Experi- 
mental Procedures). Approximately 50% of polytene TAD 
boundaries were located within 40 kb of diploid TAD boundaries 
(Figure 5E) and quantitative assessment of entire polytene TADs 
revealed far greater equivalence with Kc cell TADs than ex- 
pected on a random basis (Figure S5C; Supplemental Experi- 
mental Procedures). We observed a similar correspondence in 
the Hi-C heatmaps between polytene cells and late-stage em- 
bryos (Figure S6; genome-wide Pearson’s r = 0.793, p < 2.2 x 
1 0“^®). TADs from polytene cells and Drosophila embryos signif- 
icantly overlapped (Zscore = 15.2, p = 2.10 x 10“^^; Figure S4B; 
Supplemental Experimental Procedures) and exhibited far fewer 
mismatches than expected on a random basis (Z score = -1 5.3, 
p = 8.19 X 10“^^; Figure S4B; Supplemental Experimental Pro- 
cedures). The agreement was even more remarkable consid- 
ering that late-stage embryos contain a mixture of cell-types, un- 
like the more homogenous composition of the salivary gland. 
Evidently, most TADs are conserved in their central regions 
across a range of cells types, while a fraction of TAD boundaries 
may exhibit some cell-type specificity. 
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Figure 4. Hi-C Predicts the Location of Polytene Bands 

The locations of TADs (normalized Hi-C heatmaps, 15 kb bins; left column) were used to generate FISH probes against TAD centers (red diamonds) or TAD 
borders (green diamonds), which were hybridized to polytene chromosome spreads counterstained with DAPI (middle-left column). The TAD border FISH signal 
and the TAD center FISH signal are pseudocolored green and red, respectively, in the merged images (rightmost two columns) and the identity of the polytene 
band is indicated (right column). White arrows indicate the polytene band of interest. 

(A) FISH against a region from chromosome 2L. White arrowhead indicates a small interband between bands 22A1-2 and 22A3 in Bridges’s map. 

(B) FISH against a region from chromosome 3L. 

(C) FISH against a region from chromosome 3R. 

Loci presented here are independent of those analyzed in Figure 2. Scale bars, 2 ^im. 



Lack of Compartments in the Polytene Nucleus 

Compartmentation refers to the preferential association of 
distant chromosomal loci in two groups, which correlate with 
transcriptional activity and inactivity. At large length scales, 
chromosomes exhibit polymeric behavior: loci separated by 
large linear distances are less likely to be in close spatial prox- 
imity than loci separated by small distances (Lieberman-Aiden 
et al., 2009). Associations of distant loci become apparent 



when a correction is applied for polymeric behavior: the 
observed number of interactions between two loci is divided 
by the number of interactions expected due to variations in 
polymer conformation. The resulting observed/expected heat- 
map (Figure 6, middle column) displays regions of more or less 
interaction than the expected chromosome-wide average. The 
delineation of such regions can be enhanced by correlation anal- 
ysis, because neighboring loci along the genome in close spatial 
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proximity should share the same interaction preferences and 
spatially distant loci should differ in their interaction preferences. 
A heatmap of the correlations between interaction profiles 
shows a “plaid” pattern if there are both shared and divergent 
interaction preferences (Figures 6, right, and S7) (Lieberman-Ai- 
den et al., 2009). Boxes in the plaid pattern are of two types, 
boxes along the diagonal, at the locations of TADs in the 
observed heatmap, and off-diagonal boxes, revealing enriched 
or depleted long-range interactions. The occurrence of off-diag- 
onal boxes is indicative of compartments. Correlation heatmaps 
computed from published Hi-C data for Kc167 cells and late- 
stage Drosophila embryos (Figures 6, top two rows, and S7) 
exhibit a plaid pattern and thus compartments, as previously 
observed (Sexton et al., 2012). By contrast, a correlation heat- 
map computed from our Fli-C data for Drosophila polytene chro- 
mosomes shows no evidence of a plaid pattern (Figures 6, bot- 
tom row, and S7), indicating an absence of long-range 
interactions and a lack of compartments defined on that basis. 
A lack of compartments is not necessarily a consequence of ho- 
molog pairing, because homologs are paired in both embryonic 
(Fung et al., 1998) and Kc (Williams et al., 2007) cells. 

DISCUSSION 

Our finding of an equivalence between polytene bands and 
TADs has 2-fold significance. It complements the discovery of 
TADs by chemical cross-linking with their identification by a 
structural approach, and it shows that chromatin folding in- 
ferred from cross-linking corresponds to bona fide chromo- 
some condensation, to a state so dense it can be seen by light 
microscopy. Polytene bands are stable from cell to cell (Painter, 
1933), as well as when observed in real time (Hochstrasser 
et al., 1986); TADs are therefore similarly stable. Interbands 
correspond to regions between TADs, which thus reflect a sta- 
ble state of chromosome decondensation. The band-interband 
pattern is similar between polytene nuclei of different tissues 
(Beermann, 1972). TADs evidently represent a largely invariant, 
or conserved, feature of chromosome structure. The equiva- 
lence of polytene, diploid, and embryonic TADs shows that 
chromosome structure manifest in the polytene state is general, 
applicable to the diploid interphase nucleus, and nearly con- 
stant among all cell types. 



Although TADs are stable between embryonic stem (ES) cells 
and ES cell-derived lineages, interactions within and between 
TADs may change during ES cell differentiation (Dixon et al., 
2015). So TADs themselves do not directly regulate transcrip- 
tion, but rather additional features, such as transient chromatin 
loops, are likely presumably involved. The equivalence of poly- 
tene bands with TADs indicates that the role of TADs is most 
likely for the compaction of DNA in the interphase nucleus. 

Fine mapping of regions of polytene chromosomes (Zhimulev 
et al., 2014) has shown that interbands contain regulatory re- 
gions and promoters of genes expressed in most tissues, cell 
lines, developmental stages, and treatment conditions— so- 
called “housekeeping genes”. Regulatory regions, transcription 
start sites, and 5' transcribed regions are located in interbands, 
and the remaining coding portions of genes reside in the adja- 
cent gray bands (Zhimulev et al., 2014). Only half of total RNA 
synthesis occurs in puffs (Zhimulev and Belyaeva, 1975); the 
other half may occur in gray bands. Although the moderate res- 
olution of our Hi-C analysis and the lack of comprehensive epige- 
nomic profiling in salivary gland tissues preclude the direct 
detection of active and inactive TADs in polytene chromosomes, 
our identification of bands as TADs, along with the characteris- 
tics of gray bands, implies a connection between loosely 
compact gray bands and “active TADs” reported by others 
(Rao et al., 2014; Sexton et al., 2012). TADs and the regions 
between them are conserved across cell types likely because 
transcription is regulated and initiated in interbands and many 
transcribed genes are active in all cell types. 

Conserved interbands or regions between TADs may corre- 
spond not only to regulatory regions involved in gene activity 
but also to those needed for gene repression across cell types. 
The 5'-regulatory region of Notch, which is transcriptionally 
inactive in salivary glands, lies in an interband between bands 
3C6 and 3C7, with the coding region in band 3C7 (Rykowski 
et al., 1988). The Notch mutation facet-strawberry (/a^'^^) results 
in a small deletion harboring an insulator element (Vazquez and 
SchedI, 2000), which maps to the 3C6-3C7 interband (Rykowski 
et al., 1988). In the mutation, the 3C6-3C7 interband dis- 
appears, resulting in fusion of bands 3C6 and 3C7 (Keppy 
and Welshons, 1977). Ectopic insertion of the DNA sequence 
comprising the deletion was necessary and sufficient to 
split an endogenous band in two and form an interband at the 



Figure 5. Conservation of Polytene and Diploid TADs 

Normalized Hi-C heatmaps (15 kb bins) of a 3 Mb region of the X chromosome. Left panels: areas bounded by black boxes represent locations of polytene 
bands for which reliable DNA sequence coordinates are available (Belyaeva et al., 2012; Vatolina et al., 2011). Right panels: areas bounded by black boxes 
represent TADs. 

(A) Hi-C heatmap from polytene cells. 

(B) Hi-C heatmap from diploid Kc167 cultured cells. 

(C) Photographic image (bottom) of the region of polytene chromosome X from (A) and (B) and the same region of Bridges’s chromosome map (top). Arrows 
indicate bands represented by black boxes in (A) and (B). The DNA sequence coordinates of other bands in this region are not known and therefore cannot be 
compared with the Hi-C data. Adapted from Lefevre (1976). To prepare the panel, the original photographic image from Lefevre was digitally scanned and then 
cropped to display the region of the chromosome corresponding to that of the Hi-C analysis in (A) and (B). Black arrows were then superimposed onto the 
cropped image using Adobe Illustrator to mark the polytene bands represented by black boxes in (A) and (B). 

(D) Heatmap of the agreement between polytene TADs and diploid TADs. Each polytene TAD is represented by a row. Green segments overlap with diploid TADs, 
black segments overlap with regions between diploid TADs. TADs were normalized to the same length and 50 kb of flanking DNA is shown next to each 
normalized TAD. 

(E) Fraction of polytene TAD boundaries (n = 692) at the distance indicated on the abscissa from the closest diploid TAD boundary (calculated in 20 kb windows). 
See also Figures S4, S5, and S6. 
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Figure 6. Lack of Compartments in the Polytene Nucleus 

Normalized observed (left column), observed/expected (middle column), and Pearson correlation (right column) Hi-C heatmaps (100 kb bins) for embryonic (top 
row), diploid Kc167 cell (middle row), and polytene (bottom row) chromosome 3R. In the observed/expected heatmaps, interactions less than the expected 
chromosome-wide average are blue, those greater than the expected chromosome-wide average are red. A plaid pattern in a Pearson correlation heatmap 
indicates the presence of compartments. 

See also Figure S7. 



ectopic site (Andreyenkov et al., 2010). Although a factor that 
binds to the insulator has not yet been identified, maintain- 
ing the chromatin in a fully extended state would facilitate pro- 
tein binding. If the same gene or set of genes is transcriptionally 
repressed across cell types, the state of chromatin and acces- 
sibility to transcriptional repressors may also be conserved, 
further accounting for the similarity in Hi-C results across cell 
types. 

From the DMA sequences of bands and interbands and mea- 
surements on micrographs of polytene chromosomes, the pack- 
ing ratios (length of DMA to length of chromatin) of bands and in- 
terbands may be determined. The ratios range from 158:1 to 



205:1 for IH bands, 12:1 to 73:1 for gray bands, and 5:1 to 
12:1 in interbands (Rykowski et al., 1988; Vatolina et al., 2011). 
Conservation of Hi-C results between polytene and diploid cells 
allows us to deduce the structural states of chromatin, which 
we refer to as black, gray, and white, respectively, in the inter- 
phase, diploid nucleus. The packing ratio of a fully open chain 
of nucleosomes is expected to be 6.8:1 (Kornberg, 1974), so 
white chromatin evidently contains fully extended chromatin fi- 
bers. Approximately 5% of Drosophila DMA, containing pro- 
moters and regulatory regions, resides in this fully extended 
state (Beermann, 1972; Hou et al., 2012). Grey chromatin is up 
to 10-fold more compact and contains approximately a quarter 
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of the DrosopMa genome (Filion et al., 2010; Sexton et al., 2012), 
within which are located the coding regions of active genes. 
Therefore, transcriptional regulation occurs in a fully extended 
state and elongation occurs in a partially compacted state. Black 
chromatin is up to 30-fold more condensed than fully extended 
chromatin fibers. It contains inactive genes and accounts for 
the remaining 70% of the genome. 

The distribution of interactions within a TAD is informative 
about the pattern of chromatin condensation. The distribution 
is remarkably uniform, with every site in a TAD, particularly inac- 
tive TADs (Sexton et al., 2012), almost equally likely to become 
cross-linked with every other site (Figure 6B). Additionally, since 
homologs are paired in nearly all cell types (Fung et al., 1 998; Wil- 
liams et al., 2007), Drosophila chromosomes are particularly illu- 
minating with regard to chromatin condensation. Although 
cytology clearly demonstrates homologous chromosome pair- 
ing, our Hi-C analysis indicates that, even for polytene chromo- 
somes, at the molecular level they are not perfectly aligned. 
Nucleosome by nucleosome, the chromatin fibers from homo- 
logs and chromatids are not in lockstep— there is some vari- 
ability in the path of each of the chromatin fibers. Although at 
the cytological level, bands are present at reproducible posi- 
tions, there is some variation at the molecular level in the path 
of the DNA. Taken together, the interactions revealed by Hi-C 
cannot arise primarily from specific, stable chromatin loops, 
but rather the pattern of condensation within gray and black 
chromatin must vary from one cell to another within a population. 



Figure 7. Chromosome Condensation in 
the Interphase Nucleus 

Left: thin section eiectron micrograph of a nucieus 
(Cross and Mercer, 1993), with iightiy staining 
euchromatin in the nuciear interior, surrounded 
by darkiy staining heterochromatin, concentrated at 
the nuciear periphery. Right: cartoon representation 
of white, gray, and biack chromatin, showing pro- 
posed reiationships to heterochromatin, euchro- 
matin, and the nuciear enveiope (yeiiow). Active 
TADs in the euchromatin are nearby other active 
TADs and inactive TADs in the heterochromatin are 
nearby other inactive TADs, resuiting in gray-gray 
and biack-biack TAD-TAD interactions. The actuai 
pattern of chromatin foiding is unknown and indi- 
cated oniy schematicaiiy. 



consistent with conclusions of others 
(Gibcus and Dekker, 2013). There can 
be no unique pattern of condensation. 

Studies of chromosomal protein distri- 
bution and histone modifications have re- 
vealed up to 16 functional classes of 
chromatin (Ernst et al., 2011; Filion 
et al., 2010; Ho et al., 2014; Kharchenko 
et al., 201 1 ; Ram et al., 201 1 ; Rao et al., 
201 4; Sexton et al., 201 2), and these clas- 
ses are conserved between polytene and 
diploid cells (Zhimulev et al., 2014). One 
class corresponds closely with inter- 
bands, others classes with the flanking 
bands (Zhimulev et al., 2014). Structural information from poly- 
tene chromosomes identifies three structural states - black, 
gray, and white. A structural state may encompass multiple func- 
tional classes. For example, at the current level of resolution, we 
cannot distinguish between polycomb and HP1 repressed 
heterochromatin. 

The equivalence of polytene bands with polytene TADs and 
the virtual identity of polytene TADs with diploid TADs imply a 
close correspondence of bands with diploid TADs. The structure 
of interphase chromosomes revealed by light microscopy of 
polytene nuclei therefore applies to the organization of the 
diploid cell nucleus. A subset of TADs in mammalian cells asso- 
ciates with the nuclear lamina (Dixon et al., 2012), and 
condensed chromatin is mostly located at the periphery of nuclei 
in electron micrographs of virtually all fixed, embedded, heavy 
metal-stained eukaryotic cell preparations (Figure 7, left). Poly- 
tene bands are also identified by heavy metal staining in electron 
micrographs and interactions of polytene chromosomes with the 
nuclear envelope, although infrequent, are almost entirely 
confined to the IH bands (Hochstrasser et al., 1986). If inactive 
TADs (black chromatin) are often located near the nuclear pe- 
riphery, then the regions between inactive TADs, consisting of 
active TADs and fully extended chromatin fibers (white and 
gray chromatin), must loop into the interior of the nucleus (Fig- 
ure 7, right). It follows that inactive TADs (black chromatin) corre- 
spond to classical heterochromatin (condensed chromatin at the 
nuclear periphery) and active TADs and the regions between 
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TADs (white and gray chromatin) correspond to classical euchro- 
matin (less dense chromatin in the nuclear interior). On the basis 
of the measurements mentioned above, heterochromatin at the 
nuclear periphery consists of up to 30-fold condensed chromatin 
fibers, while euchromatin in the nuclear interior contains both 1 0- 
fold condensed and fully extended fibers. The DNA sequences 
that reside in each of these regions are known from the results 
of Hi-C analysis. The identification of polytene TADs thus gives 
meaning to the familiar picture of the interphase nucleus. 

Compartments can be understood in terms of the physical pic- 
ture of the interphase nucleus. TADs and compartments are 
related, but differ in origin and significance. TADs are aligned 
with the off-diagonal boxes in the plaid pattern that defines com- 
partments; the boundaries of TADs define the rows and columns 
of the plaid pattern (Figure 6, right column). Therefore the off-di- 
agonal boxes arise from interactions between TADs. The differ- 
ence is that TADs are due to persistent interactions. They reflect 
a stable state of axial condensation of the chromatin fiber. By 
contrast, off-diagonal boxes are most apparent following correc- 
tion for random polymer conformation and sharpening of the 
corrected heatmap. Off-diagonal boxes are therefore due to 
transient contacts between distant regions, for example short- 
lived contacts between TADs in neighboring euchromatic chro- 
matin in the nuclear interior (Figure 7, right). This picture explains 
why off-diagonal boxes fall in two categories of interacting se- 
quences, corresponding to active and inactive genes: active 
TADs (gray chromatin) tend to be in the vicinity of other active 
TADs and inactive TADs (black chromatin) in the vicinity of other 
inactive TADs. In observed heatmaps, the off-diagonal boxes 
usually appear uniform in signal strength over their entire area, 
indicative of roughly equal likelihood of interaction of every 
sequence in one TAD with every sequence in another (Sexton 
et al., 2012). The reason is probably because the TADs them- 
selves are condensed in a variable manner, as discussed above. 
All sequences have roughly equal probabilities of appearing on 
the surface of the condensed structure and interacting with adja- 
cent condensed regions. Moreover, a single TAD may make 
many distant contacts, so there cannot be a unique pattern of 
TAD-TAD interactions, but rather the trajectory of a chromosome 
must vary from one cell to another. 

Compartments refer to loci grouped on the basis of frequency 
of interaction. Compartments are not regions with boundaries, 
in the conventional sense of the term. They correspond to areas 
of the nucleus, for instance the interior and the periphery, but the 
areas themselves do not determine the state of gene activity 
(Therizols et al., 2014). The properties of polytene chromosomes 
are illustrative. Polytene chromosomes possess both active and 
inactive genes, but they exhibit no compartments, and they 
reside mostly in the nuclear interior, making little or no contact 
with the periphery. They lack compartments because they do 
not fold upon themselves to a significant extent and therefore 
have no TAD-TAD interactions. TADs may occur without com- 
partments, and chromosome condensation and gene regula- 
tion do not require compartments. Condensation along the chro- 
mosome axis is conserved, as shown by the equivalence of 
polytene, diploid, and embryonic TADs, whereas compartments 
may vary, depending on cell type and state of transcriptional 
activity. 



The organization of the interphase nucleus in Drosophila is 
relevant to the mouse and to humans, where TADs organize 
chromosomes into spatial modules connected by short chro- 
matin segments (Dixon et al., 2012). Furthermore, biochemical 
fractionation of open chromatin fibers from human cells revealed 
that the fibers are cytologically decondensed (Gilbert et al., 
2004), and it is now apparent that these fibers are likely in the fully 
extended state. The packing ratios, DNA sequences, functional 
states, and chromosomal protein patterns of the differentially 
staining areas of the interphase nucleus are thus determined. 
Genome-wide amplification and alignment in the polytene state 
reveals interphase chromosome structure at the level of light mi- 
croscopy, likely applicable to the diploid state in all monocentric 
metazoans. 

EXPERIMENTAL PROCEDURES 
Hi-C 

Hi-C was performed using a tethering approach (Kalhor et al., 2012) to 
improve the signal-to-noise from a limited amount of manually dissected, 
primary tissue. In brief. Drosophila melanogaster third instar larvae salivary 
glands were manually dissected and fixed with 2% EM grade paraformal- 
dehyde. Cross-linked proteins were then biotinylated at cysteine residues 
and the DNA digested with DpnII. Digested chromatin was bound to strep- 
tavidin beads, thoroughly washed to remove uncross-linked DNA, DNA 
ends filled in with biotin-1 4-dATP, and free DNA ends ligated together. 
DNA-protein cross-links were reversed, DNA purified, biotinylated nucleo- 
tides marking unligated ends removed, and then DNA sheared to a 
mean size of -^200 bp. The biotinylated DNA was pulled down with strep- 
tavidin beads, prepared for and subjected to high-throughput lllumina 
sequencing. Further details provided in the Supplemental Experimental 
Procedures. 

Hi-C Analysis 

Hi-C reads were mapped to the dm3 reference genome using Bowtie 2 and 
assigned to DpnII restriction fragments. Reads mapping to the same restric- 
tion fragment, separated by less than the library insert size, within 4 bp of a 
restriction site, and duplicate reads were removed. Exceptionally large 
(>100 kb) and small (<100 bp) restriction fragments and fragments with 
the highest 0.5% of counts were also removed. Filtered fragments were as- 
signed to 15 kb genomic bins, unless otherwise indicated. Further filtering at 
the bin level removed bins where less than half the bin was sequenced, the 
lowest 1% of bins, and the highest 0.05% of interchromosomal bins. The 
resulting Hi-C heatmaps were normalized using a previously described iter- 
ative approach (Imakaev et al., 2012). All of the above steps were performed 
using a previously described pipeline (Imakaev et al., 2012). 

The genome-wide directionality index (Dl), a modified chi-square statistic 
to measure the directional interaction bias of a locus, was determined as 
previously described (Dixon et al., 2012). TADs were identified by using 
the value midway between the mean of the values in the lowest two deciles 
of the contact probability along the diagonal and the mean of the values in 
the highest two deciles of the contact probability along the diagonal as a 
threshold for a low-pass filter. TADs were further required to have a mini- 
mum size of 75 kb to generously satisfy the Nyquist sampling criterion. Sta- 
tistical comparisons between polytene bands and TADs or between TADs 
from different cell-types employed a bootstrapping approach to determine 
the significance of (1) the overlap between the features in the two lists, 

(2) the aggregate unmatched length of the features in the two lists, and 

(3) the Euclidean distance between the corners of the nearest features in 
the two lists. 

The presence or absence of Hi-C compartments was determined for each 
chromosome as previously described (Lieberman-Aiden et al., 2009) by 
dividing the observed heatmap by the expected heatmap empirically deter- 
mined by dividing the number of observed interactions at a given distance 
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by the total number of loci separated by that same distance. The Pearson cor- 
relation coefficient between the row and column of the observed/ex- 
pected heatmap gives the Pearson correlation heatmap. 

Hi-C data and TADs from Kc1 67 cells (Hou et al., 201 2) and embryonic nuclei 
(Sexton et al., 2012) were previously published and obtained from GEO: 
GSE38468 and GEO: GSE34453, respectively. Further details provided in 
the Supplemental Experimental Procedures. 

FISH 

FISH was performed on acid-fixed, squashed salivary glands as previously 
described (Kennison, 2000; Pardue, 2000) with further details provided in the 
Supplemental Experimental Procedures. Primers used to generate FISH 
probes are listed in the Supplemental Experimental Procedures. 
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SUMMARY 

RAG initiates antibody V(D)J recombination in devel- 
oping lymphocytes by generating “on-target” DNA 
breaks at matched pairs of bona fide recombination 
signal sequences (RSSs). We employ bait RAG- 
generated breaks in endogenous or ectopically in- 
serted RSS pairs to identify huge numbers of RAG 
“off-target” breaks. Such breaks occur at the simple 
CAC motif that defines the RSS cleavage site and are 
largely confined within convergent CTCF-binding 
element (CBE)-flanked loop domains containing 
bait RSS pairs. Marked orientation dependence of 
RAG off-target activity within loops spanning up 
to 2 megabases implies involvement of linear 
tracking. In this regard, major RAG off-targets in 
chromosomal translocations occur as convergent 
RSS pairs at enhancers within a loop. Finally, dele- 
tion of a CBE-based IgH locus element disrupts 
V(D)J recombination domains and, correspondingly, 
alters RAG on- and off-target distributions within IgH. 
Our findings reveal how RAG activity is developmen- 
tally focused and implicate mechanisms by which 
chromatin domains harness biological processes 
within them. 



INTRODUCTION 

During B and T lymphocyte development, exons encoding anti- 
gen-binding immunoglobulin (Ig) or T cell receptor (TCR) variable 
regions are assembled from V, D, and J gene segments by V(D)J 
recombination (Alt et al., 2013). V(D)J recombination is initiated 
by RAG endonuclease, which introduces DNA double-stranded 
breaks (DSBs) between a pair of V, D, and J coding gene seg- 
ments and their flanking recombination signal sequences 
(RSSs) (Schatz and Swanson, 2011). A bona fide RSS comprises 
a conserved palindromic heptamer represented by the canonical 
CACAGTG sequence, a degenerate spacer of 12 or 23 base 
pairs (bp), and a less-conserved A-rich nonamer (Figure 1A; 



Schatz and Swanson, 2011). RSSs with 12- or 23-bp spacers 
are termed 12RSSs and 23RSSs, respectively. Efficient RAG 
cleavage is restricted to a pair of participating coding segments 
flanked, respectively, by a 12RSS and a 23RSS, referred to here 
as paired bona fide RSSs. This 1 2/23 RSS restriction helps direct 
RAG to appropriate targets within antigen receptor loci (Alt et al., 

2013) . 

RAG cleavage generates a pair of blunt broken RSS ends 
and a pair of hairpin-sealed coding ends (Figure 1B; Schatz 
and Swanson, 2011). Classical non-homologous end joining 
(C-NHEJ) fuses the two RSS ends precisely to form RSS joins 
and opens the two coding-end hairpins and joins them to form 
coding joins, which may be “processed” to lose or gain several 
nucleotides from each end (Figure 1 B; Alt et al., 201 3). While po- 
tential bona fide RSS-related sequences occur frequently across 
the genome, only a small number are documented RAG off-tar- 
gets (“cryptic RSSs”) (Merelli et al., 2010). Such RAG off-target 
activity contributes to oncogenic deletions or translocations in 
immature B and T cell cancers (Boehm et al., 1989; Larmonie 
et al., 2013; Onozawa and Apian, 2012; Papaemmanuil et al., 

2014) . While RAG1 and RAG2 bind several thousand genomic 
sites that mostly correspond to active promoters and enhancers 
(Ji et al., 201 0; Teng et al., 201 5), lower densities of cryptic RSS 
heptamers near transcription start sites may help to limit RAG 
off-target activity (Teng et al., 2015). 

The mouse IgH locus spans 2.7 megabases (Mb) with VhS and 
their downstream 23RSSs embedded in a 2.4-Mb distal portion 
separated by a 100-kb intergenic region from DhS flanked on 
both sides by 12RSSs and JhS flanked upstream by 23RSSs. 
Even though 1 2/23 restriction should allow VhS to join to un-rear- 
ranged DhS, IgH V(D)J recombination is “ordered,” with Dh to Jh 
joining occurring in early progenitor (pro)-B cells followed by 
appendage of a Vh to a DJh complex (Alt et al., 2013). Ordered 
rearrangement and other levels of /gH V(D)J recombination regu- 
lation are mediated by modulating gene segment accessibility to 
RAG (Yancopoulos and Alt, 1986). In this regard, IgH contains a 
critical regulatory element termed /ntergenic control region 1 
(IGCR1) within the Vn-to-DH interval (Guo et al., 2011). IGCR1 
suppresses proximal Vh transcription and rearrangement at the 
Dn-to-Jn joining stage and, thereby, mediates broad levels of 
V(D)J recombination control, including diversification of antibody 
repertoires, by indirectly promoting increased utilization of distal 
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Figure 1. Abundant DSBs in the 1.8-Mb c-Myc-DJp Loop Domain in v-AbI Pro-B Cells 

(A) Diagram of c-Myc-DJp and sequences of Dpi 23RSS (blue) and Jpi-1 12RSS (orange). Red triangles indicate RAG cleavage-sites. 

(B) RAG-initiated DSBs in c-Myc-DJp cassette participate in cassette DJp rearrangements but rarely to translocations involving DSBs on other chromosomes. 
Red arrows indicate HTGTS primer positions. 

(C) Linear plot with a broken y axis showing HTGTS junction profiles in indicated 20-Mb region containing c-Myc-DJp. 

(D) Potential junctional outcomes between bait Dpi 23RSS coding ends and other DSBs in c/s include deletions, excision circles, and inversions. 

(E) HTGTS junction profiles in v-AbI cells within indicated 2-Mb region containing c-/Wyc-DJp. For all panels, unmarked ticks represent 0. Black lines in the middle 
show hotspot (HS) positions (listed in Table SI). Junction numbers and percentages in + or - orientation downstream of c-/Wyc-DJp are shown. Cassette location 
is shadowed in gray. Star indicates a good cryptic RSS. 

(F) ChIP-seq profiles of CTCF and Rad21 in the 2-Mb region defined in (E). CBE orientation is indicated by purple triangles. 

(G) Heatmap showing the 1.8-Mb c-Myc loop domain defined by in situ Hi-C data in CH12-LX cell line. 

See also Figure SI and Table SI . 



VhS. The most D-proximal Vh (Vh81 x), while preferentially utilized 
in wild-type (WT) pro-B cells (Yancopoulos et al., 1984), is even 
more frequently utilized upon IGCR1 inactivation (Guo et al., 
2011 ). 

The CTCF factor binds directionally to an ~14-bp DNA target 
(Nakahashi et al., 2013), referred to as a CTCF-binding element 
(CBE) (Guo et al., 2011). CTCF is implicated in transcriptional in- 
sulation through ability to mediate chromatin loops (Ong and 
Corces, 2014). IGCR1 function relies on two divergently oriented 



CBEs within it (Guo et al., 2011). Besides IGCR1 CBEs, the 3' IgH 
boundary harbors a CBE cluster (termed “3' CBEs”), and single 
CBEs occur just downstream of proximal VhS and in intergenic 
regions between distal VhS (Degner et al., 2009). Vh CBEs are 
convergently oriented with respect to the upstream IGCR1 
CBE, and 3'CBEs are convergently oriented with respect to the 
downstream IGCR1 CBE (Guo et al., 2011). Mutational studies 
of individual IGCR1 CBEs indicated that loop(s) mediated by 
the downstream CBE focus RAG activity in early pro-B cells 
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within a domain containing the DhS and JhS, while a second 
domain mediated by the upstream CBE sequesters proximal 
VhS from RAG activity (Lin et al., 2015). 

Eukaryotic genomes are organized into a hierarchy of archi- 
tectures. Hi-C shows that chromatin is organized into topologi- 
cally associated domains (TADs) that occur on Mb or sub-Mb 
scales and that have high-frequency chromatin interactions 
within them (Dixon et al., 2012; Nora et al., 2012). Boundaries 
of such domains are often co-anchored by long-range interac- 
tions of sites bound by CTCF in association with cohesin 
(Phillips-Cremins et al., 2013; Zuin et al., 2014). Recent higher- 
resolution in situ Hi-C further revealed that mammalian genomes 
are divided into contact domains at an average scale of 185 kb 
(Rao et al., 201 4). Contact domains with endpoints that generate 
a loop are termed loop domains (Rao et al., 201 4). Loop domains 
genome-wide are commonly associated with pairs of conver- 
gent CBEs bound by CTCF and cohesin (Rao et al., 2014; Vietri 
Rudan et al., 2015). TADs have been implicated in replication 
timing (Pope et al., 2014), super-enhancer-driven transcription 
(Dowen et al., 2014), and DSB synapsis during antibody class- 
switch recombination (CSR) (Dong et al., 2015; Zarrin et al., 
2007), as well as in promoting normal limb development 
(Lupiahez et al., 2015). Mechanistic aspects of how loop do- 
mains and TADs function are not well understood. 

Our recent studies suggested an unanticipated source of RAG 
off-target activity within long chromatin domains. To study onco- 
genic consequences, we generated mice with Tcr(3 Dpi and 
Jp1-1 segments inserted into intron one of the c-Myc oncogene 
(“c-Myc-Djp cassette”). Despite frequent c-Myc-Djp cassette 
recombination in developing lymphocytes, these mice do not 
develop lymphoma (Ranganath et al., 2008). However, ATM- 
deficient, c-Myc-DJp cassette mice develop B cell lymphomas 
with c-Myc translocations/amplifications that fuse RAG-gener- 
ated IgH DSBs to sequences over a several-hundred-kb region 
downstream of c-Myc (Tepsuporn et al., 2014). These down- 
stream translocations occur exclusively on the cassette allele 
but do not involve the cassette, suggesting that RAG activity at 
bona fide RSS pairs within c-Myc promotes cutting at linked 
downstream cryptic RSSs (Tepsuporn et al., 2014). On this 
basis, we identify an immense number of previously unsus- 
pected RAG off-targets generated by a mechanism that has 
broader implications for gene regulation within loop domains. 

RESULTS 

HTGTS Assay for RAG On-Target and Off -Target DSBs 
and Translocations 

To test the hypothesis that the c-Myc-DJp cassette promotes 
cutting at cryptic RSSs downstream of c-Myc, we generated a 
v-/Ajb/-transformed pro-B cell line from mice homozygous for 
the c-/V/yc-DJp cassette allele (referred to as “c-Myc-DJp pro- 
B line”). In such lines, RAG expression can be induced in the 
context of G1 cell-cycle arrest following treatment with the 
v-AbI kinase inhibitor STI-571 (Bredemeyer et al., 2006). Due to 
propensity of cycling v-AbI transformants to form D31-to-Jp1-1 
cassette rearrangements at low level, we were able to isolate 
just one v-AbI pro-B clone with an un-rearranged cassette allele 
(Figure 1 A). This clone had a second cassette allele in DJp-rear- 



ranged configuration, which is inert for rearrangement (see 
below). Upon G1 arrest and RAG expression, the c-Myc-Djp 
construct undergoes high-frequency bona fide Dpi-to-jpi-1 
rearrangements, which fuse the downstream coding end 
of Dpi (23RSS-associated) to the jpi-1 12RSS-associated 
coding end in the chromosome and, correspondingly fuse 
the Dpi 23RSS to the Jp1-1 12RSS within an excision circle 
(Figures 1A and 1B). To detect potential cryptic RSSs activated 
by the c-Myc-DJp cassette in these v-AbI pro-B cells, 
we employed high-throughput genome-wide translocation 
sequencing (HTGTS). HTGTS is a highly sensitive DSB and 
translocation assay that identifies junctions between a broken 
end of a fixed “bait” DSB and ends of other prey DSBs 
genome-wide (Chiarle et al., 2011; Dong et al., 2015; Frock 
et al., 2015). For these analyses, we used an HTGTS bait primer 
termed “c-Myc E1” that anneals with sequences 213 bp up- 
stream of the cassette Dpi 23RSS. This primer detects Dpi 
downstream coding end joins to Jp1 -1 coding ends and to other 
DSBs genome-wide (Figure 1B). 

In the c-Myc-DJp pro-B line, the vast majority of recovered 
HTGTS junctions represented expected bona fide cassette 
Dp1-to-Jp1-1 coding joins. To enhance off-target detection, 
we experimentally suppressed recovery of bona fide cassette 
DJp joins (Figure 1B; Supplemental Information). The vast major- 
ity of remaining Dpi downstream coding-end junctions, repre- 
senting 1 %-3% of total junctions, occurred to sequences up 
to 1.8 Mb downstream of c-Myc, with additional joins to se- 
quences about 1 kb upstream. Notably, the junctions in this 
1.8-Mb region abruptly ended in both directions (Figure 1C; 
see below). Indeed, the only other clear-cut hotspot region 
genome-wide occurred at about 0.02% of total junctions and 
involved low-level translocations to Iqk (Figure S1A), a major 
bona fide RAG target in v-AbI pro-B cells (Zhang et al., 2012). 
Approximately 20% of the apparent RAG off-target sites in the 
1.8-Mb domain represented recurrent (“hotspot”) junctions 
that, in some cases, were recovered dozens of times in indepen- 
dent libraries (Table S1). HTGTS analysis of bone marrow (BM) 
pro-B cells from c-Myc-DJp mice gave similar results (Figures 
S1A-S1C). 

We also isolated an ATM-deficient v-Ab/ c-Myc-DJp pro-B line 
in which one allele had an inversion that joined the Dpi 23RSS to 
a cryptic RSS (5'-CACAGTT) in the Jp1 -1 segment (Figure S1 D). 
In this line, the second c-Myc^'^^ allele was in the inert DJp 
configuration. Following G1 arrest, HTGTS employing the 
c-Myc E1 primer revealed that the major “bona fide” V(D)J 
joining event in this line (>97% of recovered junctions) was inver- 
sional joining of the Dpi 12RSS (the upstream Dpi RSS) to the 
inverted Dpi 23RSS 693bp downstream (Figure S1 D). The vast 
majority of remaining joins (~3% of total junctions) fused the 
Dpi 1 2RSS to other DSBs along the 1 .8-MB cassette-containing 
domain with a distribution similar to that of Dpi downstream 
coding-end joins in the ATM-proficient c-Myc-DJp pro-B line 
and primary pro-B cells (Figures S1A-S1C; Table S1). Notably, 
there was increased but still low levels of translocations to Iqk 
(~0.2% of total junctions. Figure S1 A) as compared to ATM-pro- 
ficient pro-B cells. ATM-deficient BM c-Myc-DJp pro-B cells 
also had similar patterns of Dpi 23RSS coding-end junctions 
to those of ATM-proficient pro-B lines, except that they had 
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low-level translocations to IgH (~0.07%) and TCRa/d (~0.05%) 
(Figures S1A-S1C). Finally, an ATM-deficient v-AbI c-Myc-DJp 
pro-B line in which both cassette alleles were in the DJp config- 
uration generated few junctions, confirming that single 12RSS- 
containing alleles are inert (Figure S1E). 

Abundant DSBs across the 1.8-Mb c-Myc-DJ|3 Loop 
Domain 

We investigated the orientation of the thousands of Dpi down- 
stream coding-end junctions within the 1.8-Mb c-Myc region in 
the c-Myc-DJp v-AbI pro-B cell line. Junctions are denoted as 
in “+” orientation if prey sequence reads in a centromere-to-telo- 
mere direction and in orientation if prey reads in the opposite 
direction (Chiarle et al., 2011). As the c-Myc E1 primer is centro- 
meric to the bait Dpi downstream coding end, it captures junc- 
tions resulting in upstream excision circles and downstream 
deletions as + events and captures inversional junctions either 
upstream or downstream as - events (Figure 1D). Dpi down- 
stream coding-end junctions near (within 5 kb) c-Myc occurred 
at similar frequency in + and - orientations (Figure S1 F); strik- 
ingly, however, ~95% of junctions to sequences further down- 
stream of c-Myc occurred in deletional (+) orientation (Figure 1 E). 
Similar results also were obtained with ATM-deficient c-Myc- 
DJp pro-B cell lines (Figure S1C), even though their junctions 
involved Dp1-12RSS ends (Figure S1D). 

To gain insight into the basis for the well-defined boundaries of 
the DSB hotspot region flanking the c-Myc-DJp cassette, we 
analyzed existing ChIP-seq data from BM pro-B cells (Lin 
et al., 2012) and found a cluster of CTCF and cohesin subunit 
Rad21 -binding sites on both boundaries of this 1.8-Mb domain 
(Figure 1 F). Moreover, the two clusters of CBEs were in conver- 
gent orientation (Figure 1 F). Indeed, recent high-resolution in situ 
Hi-C data performed in mouse CH12-LX B cell lines (Rao et al., 
201 4) confirmed that this 1 .8-Mb region is a well-defined conver- 
gent CBE-based loop domain that contains within it a strong 
840-kb sub-loop that also extends to a convergent CBE (Figures 
1 F and 1 G). HTGTS junction density within the 1 .8-Mb domain in 
both ATM-proficient (Figures 1E-1G) and deficient c-Myc-DJp 
pro-B cells (Figure S1C) correlated well with Hi-C interactions 
within the two loop domains. 

RAG Generates DSBs in the 1.8-Mb c-Myc-Djp Loop 
Domain 

To test the relationship of frequent prey DSBs within the 1 .8-Mb 
c-Myc loop domain to RAG-generated DSBs, we searched ATM- 
proficient and ATM-deficient c-Myc-DJp junctions for sequence 
motifs in their vicinity. In this regard, the conserved b'-CAC motif 
of the RSS heptamer is a position indicator for RAG cleavage, 
with cleavage invariably occurring 5' to the CAC motif (Figure 1 A). 
For convention, a CAC is considered in “forward” orientation if 
the presumed associated “coding” sequence is centromeric to 
the RSS and in “reverse” orientation if the presumed coding 
sequence is telomeric (Figure S2A). For widespread CACs, se- 
quences in the coding position would not generally be gene seg- 
ments; thus, we refer to them as surrogate coding ends. For 
analysis, we pooled and analyzed, respectively, all + junctions 
from the two v-AbI pro-B cell types and found that the majority 
occurred in putative surrogate coding sequences at or within 



5 bp of a reverse CAC, with ~30% joined directly to the surrogate 
coding sequence immediately flanking a CAC (Figures 2A, 2B, 
S2B, and S2C). There was no significant correlation with forward 
CACs (Figures 2C and S2D). These results suggest that the 
frequent DSBs within the 1.8-MB c-Myc domain occur at 
“cryptic RSSs” represented predominantly by a conserved 
CAC. Moreover, surrogate coding ends fused to the bait ends 
were processed similarly to normal coding ends during V(D)J 
recombination. The most highly recurrent hotspot DSBs within 
the 1 .8-Mb domain tended to involve CACs within more canon- 
ical heptamers (Figure S2E). Finally, remarkably similar results 
were obtained from ATM-proficient and ATM-deficient c-Myc- 
Djp BM pro-B cells (data not shown). 

To unequivocally test the role of RAG in generating DSBs in the 
1 .8-Mb c-Myc loop domains, we deleted Rag2 in the ATM-defi- 
cient c-Myc-DJp pro-B cell line (Figure S2F). For HTGTS bait, we 
employed a Cas9/gRNA to generate DSBs 519 bp downstream 
of thec-Myc-Djp cassette and designed a primer that allowed 5' 
broken ends of these DSBs to be used as bait (“5'Cas9 bait 
ends”; Figure 2D). We then performed HTGTS on RAG-sufficient 
and RAG2-deficient G1 -arrested pro-B cells. Recovered 5'Cas9 
HTGTS junctions from RAG-sufficient ATM-deficient c-Myc-DJp 
v-AbI pro-B cells correlated with reverse CACs in the 1.8-Mb 
domain as expected; however, unlike RAG-generated bait 
broken ends, the Cas9/gRNA-generated bait ends recovered 
junctions equally in + and - orientation (Figures 2D-2F and 
S2G). Performing these assays in ATM-deficient v-AbI pro-B 
cells that either lacked the c-Myc-DJp cassette or were RAG2 
deficient generated only a very few junctions within the 1 .8-Mb 
domain, and these had no correlation with CACs (Figures 2D- 
2F and S2H). These findings confirm that RAG generates the 
off-target DSBs across the 1 .8-Mb c-Myc domain in a c-Myc- 
Djp-cassette-dependent fashion and also demonstrate that 
the asymmetric prey-joining preferences observed are specific 
to RAG-generated bait ends. 

Paired Bona Fide RSSs Generate RAG Off -Target 
Activity in Loop Domains Genome-wide 

We next tested whether other loop domains genome-wide could 
similarly be targets for such widespread RAG-generated DSBs if 
they contain bona fide RSS pairs. To insert bait RSSs into multi- 
ple genomic sites, we infected ATM-proficient and -deficient 
v-AbI pro-B lines with the pMX-DEL-SJ virus (referred to as 
“DEL-SJ”), which harbors a pair of divergent bona fide RSSs 
flanking an inverted GFP sequence (Figure 3A; Bredemeyer 
et al., 2006). V(D)J recombination between the divergent DEL- 
SJ RSSs fuses them in the chromosome and liberates the inter- 
vening GFP DMA within an excision circle generated via fusion of 
the surrogate coding ends (Figure 3A). We isolated six indepen- 
dent sub-clones from each genotype, each with a unique DEL- 
SJ-integration, treated them with STI-571, and generated 
HTGTS libraries with primers adjacent to either the construct 
12RSS (12S primer) or 23RSS (23S primer) (Figure S3A). In all 
12 DEL-SJ integration sites, the 12RSS and 23RSS junctions 
were confined within convergent CBE-based loop domains 
that ranged from 174 kb to 2.64 Mb in size (Table S2) and which 
often contained sub-domains flanked by convergent CBEs. For 
all integration sites, translocation junction density correlated 
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Figure 2. RAG Generates DSBs across the 1.8-Mb c-Myc-DJp Loop 
Domain 

(A) Schematic of translocations between D(31 downstream coding ends and 
cryptic RSSs mostly represented by CAC motifs in the 1 .8-Mb c-Myc domain 
in c-Myc-DJ(3 vAbI pro-B cells. 

(B and C) Distance of Dpi downstream coding-end junctions to reverse CACs 

(B) or forward CACs (C) within the1.8-Mb domain in c-Myc-DJp vAbI pro-B 
cells. Direct joining to the nucleotide immediately adjacent to CAC is defined 
as 0 in this and following panels. Mean ± SD, n = 3. 



well with interaction intensities revealed by Hi-C. Representative 
findings from chromosome X, 4, 12, and 19 integrations are 
shown (Figures 3B-3K, S3B,and S3C). Notably, junctions de- 
tected from either 12RSS- or 23RSS-specific primers mostly 
occurred in deletional orientation independent of the orientation 
in which the DEL-SJ was integrated relative to the centromere 
(Figures 3B-3K, S3B, and S3C). As for the c-Myc-DJ^ 1.8-Mb 
domain, hotspots also were apparent (Figures 3B-3K, S3B, 
and S3C). 

We examined junction sequences within the two DEL-SJ loop 
domains on chromosome X and 4, respectively, for potential cor- 
relations with forward or reverse CACs. Deletional and excision 
circle junctions represented >95% of events for any given inte- 
gration site (Figures 3B-3K). Strikingly, the vast majority of junc- 
tions were highly correlated with CACs; however, while bait 
RSSs joined to convergent upstream cryptic CACs, they joined 
to surrogate coding ends associated with downstream CACs 
in the same orientation to form apparent “hybrid” RSS-to-cod- 
ing-end joins (Figures 4A-4C and S4A-S4C; but see below). 
Analysis of several other DEL-SJ-containing domains (on chro- 
mosomes 12 and 19) revealed precisely the same patterns 
despite diverse locations and relative chromosomal orientations 
(data not shown). Notably, upstream CACs were generally joined 
precisely to bait RSSs, but downstream joins to surrogate coding 
ends were often imprecise, with deletions of several nucleotides 
from the CAC border (Figures 4A-4C and S4A-S4C). The latter 
result, together with junctional sequence analysis of bait RSS 
ends (Figure S4D), indicates that RSS ends from the DEL-SJ 
construct that join downstream behave like surrogate coding 
ends in a V(D)J recombination-type of joining reaction. 

Normal DEL-SJ V(D)J recombination generates fused RSS 
pairs at a high frequency (Figure 3A) that can be re-cleaved by 
RAG, with one cleavage product then being treated as an RSS 
end and the other as a surrogate coding end (Figure 4D; Meier 
and Lewis, 1 993). Thus, the apparent downstream “hybrid joins” 
observed with the bait 12RSS, consistent with their end struc- 
ture, could be generated from the fused intermediate. To test 
this possibility, we used as bait the 12RSSs of perfectly fused 
12-23 joins of DEL-SJ within the chromosome X and 4 integra- 
tions, respectively. Indeed, this fused RSS pair faithfully recapit- 
ulated the joining patterns of the parental un-rearranged DEL-SJ 
construct in this location (Figures 4B, 4C, 4E, S4B, and S4E), 
demonstrating that the joining orientation of the two fused 
RSSs determines whether one or the other acts as an RSS end 
or surrogate coding end in the off-target V(D)J recombination 
joining reaction. Finally, we also generated HTGTS libraries 
from the surrogate coding ends (GFP primer) associated 
with 12RSS of DEL-SJ integrated into chromosome X. Such 
surrogate coding end junctions would not be re-cleaved by 
RAG. Correspondingly, nearly 90% of the 12RSS-associated 
coding ends joined downstream of the GFP primer to surrogate 



(D) Schematic of translocations between Cas9/gRNA-initiated bait DSBs and 
DSBs in the c-Myc domain in ATM-deficient pro-B ceiis with (top) or without 
(bottom) the c-Myc-DJp cassette. 

(E and F) Distance of 5'Cas9 junctions in + (E) or - (F) orientation to reverse 
CACs in the c-Myc domain in ceiis defined in (D). Means ± SD, n = 3. 

See aiso Figure S2. 
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Figure 3. DSBs Restricted in Genome-wide 
DEL-SJ-Containing Loop Domains 

(A) Diagram showing major RAG-initiated joins in 
DEL-SJ-containing v-AbI pro-B cells. 

(B) Potential junctional outcomes between bait 
12RSS and DSBs in c/s include deletions, excision 
circles, and inversions. 

(C) Profiles of 12RSS junctions within chromo- 
some X in ATM-deficient vAbI pro-B cells. Black 
triangles indicate insertion site of DEL-SJ. Junc- 
tion numbers and percentages in + or - orientation 
upstream or downstream of bait 12RSS are shown 
separately. 

(D and E) Profiles of 23RSS junctions in the indi- 
cated 3.5-Mb region containing DEL-SJ on chro- 
mosome X. 

(F) ChIP-seq profiles of CTCF/Rad21 (top) and 
heatmap of in situ Hi-C (bottom) in this 3.5-Mb 
region. 

(G-K) 12RSS and 23RSS junctions across the 
1-Mb DEL-SJ-containing loop domain on chro- 
mosome 4. Stars indicate stronger cryptic RSSs. 
See also Figure S3 and Table S2. 
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coding ends adjacent to CACs (Figures 4F-4H). Together, our 
findings show that, for bona fide RSS pairs within a loop domain, 
both the RSS and the associated coding sequence join to 
convergent cryptic RSSs (CACs) and associated surrogate cod- 
ing ends within a loop domain via a V(D)J recombination-like 
reaction. 

Robust Detection of RAG Off -Targets Genome-wide 
Outside of Chromatin Domains 

We further analyzed the 12RSS-associated coding end (GFP)- 
primed DEL-SJ HTGTS libraries from the X chromosome integra- 
tion in ATM-deficient v-AbI lines and additional libraries from an 
integration on chromosome 1 in a different ATM-deficient v-AbI 
line (Figure S5A). Beyond the expected joining patterns within 



the DEL-SJ-containing loop domains 
(Figures S5B-S5D), these libraries also 
revealed 107 translocation hotspots 
across the genome that all occurred at 
or near heptamers related to the canoni- 
cal CACAGTG motif (Figures 5A, 5B, 
and S5E; Table S3). Notably, 60 of the 
107 identified cryptic RSSs occurred in 
pairs in convergent orientation within 
<100 kb in the same domain (Figures 
5A, 5C, and S5E; Table S3). HTGTS em- 
ploying a primer upstream of the cryptic 
RSS in one such pair on chromosome 1 
(Figure 5C) revealed thousands of deletio- 
nal junctions involving two cryptic RSSs 
(Figures S5F-S5H). We compared loca- 
tions of these 1 07 cryptic RSSs with exist- 
ing pro-B H3K4me3 ChIP-seq data, 
which marks promoters, or H3K27Ac 
data that marks promoters and en- 
hancers (Lane et al., 2014; Whyte et al., 
2013). Strikingly, 97 of the 107 RAG off-targets overlapped 
with H3K27Ac-marked regions, with 38 overlapping with 
super-enhancers and 59 with typical enhancers. Of these, 65 
overlapped with regions marked by both H3K4me3 and 
H3K27AC (Figures 5D and 5E). These remarkably high correla- 
tions demonstrate that accessibility, beyond RAG binding, also 
is important for efficient RAG cleavage at cryptic RSSs. 

IgH Employs CBE-Based Subdomains to Regulate RAG 
On- and Off -Target Activity 

We applied HTGTS to test whether RAG on- and off-target 
activity in IgH is confined within IGCR1 CBE-based domains 
(Figure 6A). We employed an ATM-deficient v-AbI pro-B cell 
line that harbors a DFL16.1-Jh3 rearrangement, providing a 
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Figure 4. Orientation-Biased Joining of RAG-Initiated DSBs in Loop Domains 

(A) Schematic of translocation between bait 12RSS and CACs within the DEL-SJ-containing loop domain on chromosome X in ATM -deficient v-AbI pro-B cells. 

(B) Distance of 12RSS junctions upstream of bait 12RSS to forward CACs; no correlation was found with reverse CACs. 

(C) Distance of 12RSS junctions downstream of bait 12RSS to reverse CACs; no correlation was found with forward CACs. 

(D) Diagram of recombination output generated by RAG re-cleavage at perfect 12-23RSS joins. 

(E) Profiles of 1 2RSS junctions of 1 2-23RSS join within chromosome X in ATM-deficient vAbI pro-B cells. Star indicates a relatively stronger cryptic RSS, and loop 
domain is shadowed in gray, also in (G). 

(F) Diagram of bait surrogate coding ends associated with DEL-SJ 12RSS and the potential outcomes. 

(G) Profiles of GFP primer junctions within chromosome X in ATM-deficient vAbI pro-B cells. 

(H) Distance of GFP primer junctions downstream of bait surrogate coding ends to forward CACs. 

See also Figure S4. 
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Figure 5. Genome-wide RAG Off-Targets 

(A) Circos plot on a custom log scale showing genome-wide translocation profile of GFP primer junctions in DEL-SJ-integrated ATM-deficient v-AbI pro-B cells. 
Bin size is 2 Mb, and 50,000 HTGTS junctions are shown. Colored lines link break site to identified cryptic RSSs. 

(B) Consensus sequence of cryptic RSS heptamer extrapolated from the 107 identified cryptic RSSs. 

(C) Paired cryptic RSSs on chromosome 1 (left) and 15 (right). Green arrows indicate the position and orientation of cryptic RSSs in these four translocation 
hotspots. 

(D) Venn diagram showing number of identified cryptic RSSs that overlap with H3K27Ac and H3K4me3. 

(E) Pie chart showing number of identified cryptic RSSs that overlap with typical enhancers and super-enhancers. 

See also Figure S5 and Table S3. 



population of cells harboring a 5'D 12RSS expected to join to 
accessible upstream VhS 23RSSs (Alt et al., 2013). We used an 
HTGTS primer 82 bp upstream of the 5' DFL1 6.1 1 2RSS to cap- 
ture joins involving bait 5'DFL1 6.1 -Jh 3 RSS ends (Figure 6B). The 
majority of ~27,000 recovered IgH HTGTS junctions were on- 
targets at IgH bona fide RSSs (85%; Table S4), with most fusing 
the DFL1 6.1 5'RSS to a Vh 23RSS in physiologic (excision circle) 
orientation (Figures 6B and 6C). While such junctions involved 
multiple VhS across the 2.4-Mb Vh domain, they were biased 
toward proximal VhS, particularly Vh81x (38% of on-targets) 
(Figure 6C). We also observed substantial-inversional (- 1 -) joining 
between the DFL16.1 5'RSS and Jh 4 23RSS (20% of IgH on- 
targets) (Figures 6B and 6C). Strikingly, IGCR1 deletion dramat- 
ically increased the number of DFL16.1 5'RSS junctions 
recovered (28-fold) (Figure S6A; Table S4), largely from markedly 
increased utilization of proximal VhS (48-fold) and, in particular, 
Vh81x (92% of junctions) (Figures 6D, S6B, and S6D). Corre- 



spondingly, there was an 18-fold decrease in distal/middle Vh 
utilization and a 20-fold decrease in Jh 4 junctions (Figures 6D, 
S6C, and S6F). 

These IgH HTGTS studies also revealed low but highly repro- 
ducible off-target joining of DFL1 6.1 1 2RSS ends to DSBs within 
/gH that correlated with CACs (Figures S6G and S6H). Strikingly, 
~95% of the off-target /gH junctions were within a tightly focused 
12.3-kb region that contains the DFL16.1-Jh3 and is bounded 
upstream by IGCR1 and downstream by iE|i/S|i (Figures 6E 
and 6F). We refer to this region as the iE|i/S|i-to-IGCR1 “recom- 
bination domain.” Strikingly, deletion of IGCR1 from this ATM- 
deficient pro-B line dramatically changed the profile of off-target 
DSBs, permitting them to spread ~120 kb upstream into the 
proximal VhS while decreasing the percentage of off-target junc- 
tions in the former iE|i/S|i to IGCR1 domain to 13% (Figures 6G 
and S6I). Thus, IGCR1 deletion established a new iE|i/S|i-to- 
proximal-Vn recombination domain in which RAG activity on 
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both cryptic RSSs and proximal VhS bona fide RSSs is re- 
focused. As in other domains, RAG off-target activity was highly 
dependent on convergent CAC orientation once several kb from 
the DFL16.1 5'RSS break site (Figures S6H and S6J). 

DISCUSSION 

Mechanism of RAG Off-Target Activity 

We report a major form of RAG off-target activity that eluded 
prior investigations. Remarkably, this activity is largely confined 
to loop domains containing paired bona fide RSSs, with cleav- 
age requiring only recognition of a simple CAC motif. Also 
remarkable, this off-target RAG activity is directionally oriented 
such that RSS ends from paired bona fide RSSs join to conver- 
gent CAC-containing motifs, while coding ends from paired bona 
fide RSSs join to surrogate coding ends associated with a CAC. 
Thus, RSSs and corresponding coding ends join with the same 
patterns and in the same locations, consistent with V(D)J recom- 
bination (Figure 4). Such orientation dependence is most readily 
explained by a linear tracking mechanism (Yancopoulos et al., 
1984; Figure 7). Based on RAG structural information (Kim 
et al., 2015), we propose a working model (Figures 7 and S7). 
This model assumes that formation and activation of the tetra- 
meric RAG1/2 complex (Lapkouski et al., 2015) requires binding 
of paired bona fide RSSs (Figure 7B). We hypothesize that one or 
the other of these occasionally escapes the activated complex, 
allowing cryptic RSSs to replace them. The replacement process 
could involve diffusion of proximal cryptic RSSs or unidirectional 
tracking to more distal cryptic RSSs downstream. Once appro- 
priately positioned in the activated complex, cryptic RSSs and 
surrogate coding ends could, likely at reduced frequency, be 
cleaved and joined to their remaining bona fide counterpart via 
a reaction that preserves most aspects of normal V(D)J recombi- 
nation (Figure 7G). This general tracking model explains all 
aspects of our findings, including tracking from the two bona 
fide RSSs of a pair in opposite directions around the loop. 

Implications for Normal Loop Domain Functions 

An obvious and important question arises as to why RAG activity 
is so highly restricted within loop domains containing the initi- 
ating paired bona fide RSSs. One contributing factor could be 
high interaction frequency of DNA in chromatin across these do- 
mains (Alt et al., 2013). In this regard, DSB ends find and join to 
ends of other DSBs within such domains at higher frequency 
than elsewhere in the genome (Alt et al., 2013; Zhang et al., 
2012). During IgH CSR, this phenomenon promotes proper and 
frequent joining of AID-initiated DSBs (Dong et al., 2015; Zarrin 
et al., 2007). Such DSB interactions are evident in our current 
studies in which Cas9/gRNA-generated DSBs frequently join to 
RAG off-target DSBs within the same loop. However, distinct 
from RAG-generated RSS or coding ends, a given Cas9/gRNA 
bait end joins to both cryptic RSS ends and surrogate coding 
ends of RAG off-target DSBs. Another apparent difference is 
that site-specific nuclease- or AID-generated DSBs appear to 
find off-targets in other regions across the genome much more 
readily than do RAG-generated DSBs, even in WT cells (Chiarle 
et al., 2011; Dong et al., 2015; Frock et al., 2015). The almost 
exclusive restriction of RAG off-targets to paired bona fide 



RSS-containing loops implies that an additional mechanism en- 
forces such RAG activity. 

The tracking mechanism can explain the additional restriction 
of RAG activity within a given loop (Figure 7). We do not know the 
mechanism that propels RAG tracking, although transcription 
and/or cohesin might be involved (e.g., Nichols and Corces, 
2015). However, it is reasonable to assume that tracking is 
terminated when it encounters a block imposed by the CTCF/ 
cohesin-bound convergent CBE pair or similar loop-forming in- 
teractions (Figure 7F). Such blockage would terminate tracking 
in each direction from paired bona fide RSSs and limit off-target 
RAG activity to the loop. In support of this model, deletion of the 
CBE-based IGCR1 allows RAG off-target activity to extend from 
its initial highly restricted location in the D-Jh recombination 
domain to >100 kb upstream, where new boundaries may form 
via Vh CBEs and/or associated factors (Figure 6G). Beyond regu- 
lating V(D)J recombination, related loop domain functions might 
impact on other activities constrained within them, including 
replication (Pope et al., 2014) and promoter/enhancer interac- 
tions (Dowen et al., 2014). 

IgH Locus Regulation 

Regulated /gH Vn-to-DdH recombination depends on the integrity 
of the two divergent CBEs within IGCR1, likely via formation of 
loop domains that focus RAG activity on DhS and JhS (Guo 
et al., 201 1 ; Lin et al., 201 5). Our HTGTS studies provide additional 
insights into IgH V(D)J recombination regulation (Figure 6). In a 
DJn-rearranged pro-B cell line, on-target rearrangements of the 
5'D RSS occurto RSSs of VhS across the locus but predominantly 
to 3' VhS (Figure 6C). As most RAG off-target activity is focused in 
a small 1 2.3-kb recombination domain from IGCR1 to the iE|i/S|i 
boundary (Figure S6G), the recombination domain in these cells 
does not extend downstream to 3'CBEs as perhaps anticipated. 
This restriction could be due to IGCR1 CBE looping with non-CBE 
elements at iE|i/S|i (Guo et al., 201 1 ) and/or by tracking limitations 
imposed by a unidirectional mechanism. In the DJn-rearranged 
cells, bona fide V(D)J recombination at upstream VhS in the 
absence of corresponding off-target activity, even in proximal 
portions of the locus, is consistent with VhS entering the recombi- 
nation domain by a specialized mechanism operating subsequent 
to locus contraction (Bossen et al., 201 2). Based on off-target ac- 
tivity as an assay, IGCR1 deletion extends the recombination 
domain linearly into proximal VhS, resulting in a huge overall 
V(D)J recombination increase, involving Vh81 x and other proximal 
VhS (Figures 6D and S6D). This increase may be facilitated by 
increased interaction frequency gained by placing Vh 23RSSs in 
the same loop domain as the 5' DFL16.1 12RSS and/or by a 
tracking contribution. Finally, a unidirectional RAG tracking mech- 
anism also might explain why 3'D 12RSSs, but not 5'D 12RSSs, 
are used developmentally in D-to-Jn rearrangements. 

RAG Off-Target Activity, Chromosomal 
Rearrangements, and Cancer 

We prove our hypothesis that inserting paired bona fide RSSs 
into c-Myc activates RAG-generated DSBs at cryptic RSSs 
over a long region downstream that, in the context of ATM defi- 
ciency, promotes oncogenic translocations. These findings 
explain how paired bona fide RSSs within a Tcra excision circle 
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Figure 6. Effect of IGCR1 Deletion on RAG Targeting in the IgH Locus 

(A) Schematic of murine IgH iocus. Purpie arrowheads indicate position and orientation of CBEs. Red arches show proposed iooping between convergent IgH 
CBEs (Lin et al., 2015). 

(B) Illustration of possible joining outcomes between bona fide RSSsfrom DFL16.1 5' RSS bait end to upstream Vh and downstream Jh broken ends. Red arrows 
in all panels indicate the position and orientation of HTGTS primer. 

(C and D) Top panels are IgH annotation track. Middle panels show CTCF ChIP-seq profiles. Bottom panels are pooled HTGTS junction profiles for IgH bona fide 
RSSs (n = 3). Junctions are displayed as stacked tracks (log scale between tracks, linear scale within each track). Junction numbers and percentages as of total 
IgH on-targets in indicated regions are shown. We note that, beyond junctions described in the text, we also detected junctions between DFL16.1 5'RSS and 
pseudo Dh RSSs in the VH-to-Dn intervening region in ATM“^“-IGCR1 A cells (green star; see also Figure S6E). We also found a very small number of hybrid joins to 
Vh coding ends (+ joins) or Jh 4 coding ends (- joins). 

(legend continued on next page) 
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fragment integrated into the HPRT locus in leukemia cells causes 
further genomic aberrations (Messier et al., 2006) and also sup- 
port the hypothesis that translocations downstream of c-Myc in 
human B cell lymphomas involve cryptic RSSs (Kroenlein et al., 
2012). Given that cryptic RSS targeting downstream of c-Myc 
occurs in both WT and ATM-deficient pro-B cells, one role of 
ATM in suppressing such translocations would be through stabi- 
lizing ends in RAG post-cleavage complexes to facilitate their 
joining via V(D)J recombination (Bredemeyer et al., 2006). 



Figure 7. Linear Tracking Model for Orien- 
tation-Biased Usage of CACs in the Paired 
Bona Fide RSSs-Containing Loop Domains 

(A) Linear map of a DEL-SJ-containing loop domain. 

(B) Activation of RAG by paired bona fide RSSs in 
the CTCF/cohesin-anchored loop domain. 

(C) One bona fide RSS may escape at a very low 
frequency leaving the activated complex. 

(D) A CAC in convergent orientation of the re- 
maining bona fide RSS binds to RAG to facilitate 
cleavage or initiate unidirectional tracking. 

(E) Further tracking resulting in usage of another 
convergent CAC. 

(F) RAG tracking in the loop in either direction is 
stopped by the boundary formed by the CTCF/ 
cohesin complex. 

(G) Joining products from pairing and cleavage 
between remaining bona fide RSS and conver- 
gent CACs. 

More details are in the text. Various modifications 
of this basic tracking model can be conceived to 
explain joining patterns of proximal V, D, and J 
segments in endogenous antigen receptor loci 
(data not shown). 

See also Figure S7. 



Thus, ATM limits potential RAG-initiated 
translocations by promoting joining of 
RAG-initiated DSBs at RSSs and cryptic 
RSSs within a loop. Our findings also pro- 
vide a mechanism for oncogenic translo- 
cations to sequences far downstream of 
c-Myc in C-NHEJ/p53 double-deficient 
pro-B cells (Alt et al., 201 3). In this regard, 
we find cryptic RSSs in the c-Myc 1 .8-Mb 
domain that are closer to consensus 
(Merelli et al., 2010) and, therefore, may 
drive RAG-initiated DSBs at other cryptic 
RSSs in this domain that become liber- 
ated from post-cleavage complexes in 
the absence of C-NHEJ. 

We also found 107 genome-wide 
cryptic RSSs, not related to antigen re- 
ceptor loci or paired bona fide RSSs-con- 
taining domains, that were DSB and translocation targets in 
ATM-deficient v-AbI pro-B cells (Figure 5; Table S3). This set of 
cryptic RSSs tended to have heptamers even closer to 
consensus than recurrent hotspots within paired bona fide 
RSSs-containing loops (Figure 5B versus S2E). Many of these 
translocation target RSSs occurred in pairs separated by 
<1 00 kb (Table S3), with each member of the pair falling directly 
within enhancer and/or promoter regions (Figures 5C-5E). 
Enhancer/promoter loops also might increase the frequency 



(E) Pooled RAG off-target junction profile of a 24-kb region including the recombination domain for ATM-deficient cells (n = 3). 

(F and G) Pooled RAG off-target junction profiles of a 240-kb region including the recombination domain for ATM-deficient cells with (F) or without (G) IGCR1 , 
respectively (n = 3). Brown shadowed region marks the location of IGCR1 . Junction numbers and percentages as of total IgH off-targets in indicated regions are 
shown. 

See also Figure S6 and Table S4. 
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with which such paired cryptic RSSs are juxtaposed to form sta- 
ble RAG synaptic complexes. Strikingly, all 30 pairs of these 
cryptic RSS translocation targets were in convergent orientation 
(Table S3), similar to most proximal paired bona fide RSSs within 
antigen receptor loci (Bossen et al., 2012) and the majority of 
cryptic RSSs captured by bona fide RSSs within loop domains. 
Thus, to serve as a strong genome-wide translocation target, 
cryptic RSS require a good heptamer, location in enhancers 
and/or promoters, and convergent pairing with another good 
cryptic RSS in the same loop. Finally, our findings provide a 
mechanistic basis for recurrent oncogenic chromosomal 
interstitial deletions in tumors arising from developing human 
lymphocytes (Larmonie et al., 2013; Mullighan et al., 2008; Pa- 
paemmanuil et al., 2014). 

EXPERIMENTAL PROCEDURES 
Cell Lines 

BM pro-B cells were purified by aB220 selection from ATM-proficient 
and -deficient c-/Wyc-DJp mice (Tepsuporn et al., 2014) and were cultured in 
opti-MEM medium with 10% (v/v) FBS plus IL-7 (2 ng/ml) and SCF (2 ng/ml) for 
4 days. The v-AbI pro-B cells were cultured in RPMI medium with 15% (v/v) 
FBS; cells were treated with STI-571 (3 ^iM) for 4 days to express RAG. WT 
and ATM-deficient v-AbI pro-B cell lines were described previously (Zha et al., 
2011). ATM-proficient and -deficient c-Myc-DJp v-AbI pro-B cell lines were 
made specifically for this study from E;u-Sc/-2 transgenic mice of the correspond- 
ing genotypes. We included the Eii-Bcl-2 transgene in these cells to protect STI- 
571 -treated (G1 -arrested) v-AbI pro-B cells from apoptosis; prior work showed 
that Bcl-2 expression has no effect on V(D)J recombination (Zha et al., 201 1). 

RAG On- and Off -Targets 

HTGTS was performed and analyzed as previously described with modifica- 
tions (Frock et al., 2015). Primers for HTGTS are listed in Table S5. Due to the 
very low junctional diversity of bona fide V(D)J recombination RSS joins and 
coding joins, we included duplicate junctions in our analyses of G1 -arrested 
v-AbI cells to better reflect the actual frequencies of the various classes of 
bona fide and off-target junctions. Where approximate percentage and/or 
numbers of different classes of junctions are indicated (e.g., c-Myc-DJ^ or 
IgH), we controlled for reproducibility by performing at least three independent 
experiments (e.g.. Table S4). RAG off-target hotspots were identified by 
MACS2 (Zhang et al., 2008), with extend size (extsize) at 20 bp and false discov- 
ery rate (FDR) cut-off at 10“®. See Supplemental Information for more details. 

ChIP-Seq and Hi-C Data 

CTCF and Rad21 ChIP-seq data were extracted from Lin et al. (2012) (GEO: 
GSE40173); H3K4me3 and H3K27Ac ChIP-seq data were extracted from 
Lane et al. (2014) (GEO: GSE48555). These data are from BM pro-B cells. 
We re-analyzed ChIP-seq data with Chilin software (http://cistrome.org/ 
chilin/) in the simple model against mm9. Enhancer annotation was either 
extracted directly from Whyte et al. (2013) (GEO: GSE44288) or identified by 
Homer software (Heinz et al., 2010) from re-analyzed H3K27Ac ChIP-seq 
data (Lane et al., 2014). In situ Hi-C data for CH12-LX B cells was extracted 
and displayed (KR normalization) by Juicebox software (Rao et al., 2014). 

ACCESSION NUMBERS 

The Gene Expression Omnibus (GEO) accession number for the datasets re- 
ported in this paper is GEO: GSE73007. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures, 
seven figures, and five tables and can be found with this article online at 
http://dx.d 0 i. 0 rg/l 0. 1 01 6/j.cell.201 5.10.01 6. 
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SUMMARY 

Alterations in estrogen-mediated cellular signaling 
play an essential role in the pathogenesis of endome- 
triosis. In addition to higher estrogen receptor (ER) 3 
levels, enhanced ER3 activity was detected in endo- 
metriotic tissues, and the inhibition of enhanced ER3 
activity by an ER3-selective antagonist suppressed 
mouse ectopic lesion growth. Notably, gain of ER3 
function stimulated the progression of endometri- 
osis. As a mechanism to evade endogenous immune 
surveillance for cell survival, ER3 interacts with 
cellular apoptotic machinery in the cytoplasm to 
inhibit TNF-a-induced apoptosis. ER3 also interacts 
with components of the cytoplasmic inflamma- 
some to increase interleukin-1 3 and thus enhance 
its cellular adhesion and proliferation properties. 
Furthermore, this gain of ER3 function enhances 
epithelial-mesenchymal transition signaling, thereby 
increasing the invasion activity of endometriotic tis- 
sues for establishment of ectopic lesions. Collec- 
tively, we reveal how endometrial tissue generated 
by retrograde menstruation can escape immune sur- 
veillance and develop into sustained ectopic lesions 
via gain of ER3 function. 

INTRODUCTION 

Endometriosis is a medical condition in which endometrial 
cells are deposited and grow outside the uterine cavity (Bulun, 
2009; Giudice, 2010). Severe symptoms of endometriosis are 
typically observed in 6%-1 0% of reproductive-aged women (Si- 
moens et al., 2007). Among patients with endometriosis, approx- 
imately 50% have major pelvic pain, and 40%-50% have fertility 
problems (Eskenazi and Warner, 1997; Ozkan et al., 2008). In 
these patients, endometriosis-associated symptoms negatively 
impact their health and quality of life (Moradi et al., 2014). 
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To improve the efficiency of endometriosis therapies, it is 
important to dissect the unique molecular properties of endo- 
metriotic tissues compared with normal endometria. Previous 
studies identified several endocrine properties associated with 
endometriotic tissues. Altered estrogenic signaling pathways 
have been reported in endometriosis pathogenesis (Bulun, 
2009). Endometriotic lesions have been reported to contain 
higher 17p-estradiol levels than normal endometria due to the 
elevated expression of 17p-hydroxysteroid dehydrogenase-1 
and aromatase genes compared with normal endometria (Aden 
et al., 2007; Delvoux et al., 2009). These higher levels of local 
17p-estradiol could play a role in the proliferation of endometri- 
otic tissues (Zhang et al., 2010). This increased 17p-estradiol 
binds and activates estrogen receptors (ERs) in endometriotic 
tissues to stimulate estrogen-dependent growth. There are two 
different forms of the ER, usually referred to as a and p, each 
encoded by a different gene, ESR1 and ESR2, respectively. Prior 
studies with ERa“^“, ERp“^“ mouse models and selective estro- 
gen receptor modulators revealed essential roles of both ERa 
and ERp in mouse ectopic lesion development (Burney, 2013; 
Zhao et al., 2015). However, each ER isoform has a unique 
expression pattern between endometriotic tissues and normal 
endometrium. In the case of ERa, it is controversial whether 
ERa has an endometriotic tissue-specific pattern (Han and 
O’Malley, 2014). In contrast to ERa, however, the mRNA level 
of ERp is significantly higher in endometriotic tissues than in 
normal uterine endometrium (Bulun et al., 2012). Aberrant ERp 
levels in endometriotic tissues have been associated with a 
distinct epigenomic profile in the ERp genomic locus: a hypome- 
thylated promoter of the ERp gene was detected in endometri- 
otic tissues compared with normal endometria and correlates 
with increased ERp mRNA levels (Xue et al., 2007). 

What is the role of ER isoforms in the pathogenesis of endo- 
metriosis? Unfortunately, the detailed molecular mechanism 
regarding specific contribution of each ER isoform in the 
endometriosis progression is not clearly elucidated yet. Only 
partial information is available. For examples, ERa"""" mouse 
with endometriosis revealed that the ESR1 gene is required 
for attachment, inflammation, and proliferation of ectopic 
lesions (Burns et al., 2012). ERp directly induces Ras-like 

CrossMark 





Cell 



Mouse Endometrium 






♦ ERp 



¥ Tubulin 






♦ ERa 

♦ Tubulin 






PRB 

PRA 



^Tubulin °- 



Eutopic 

(n=6) 



Mouse Endometrium 



Endometriosis 



Sham 



Eutopic 



Ectopic 




c/ 









♦ ERp 



♦ ERa £ “ 






M PRB I 

♦ PRA^ 

a: 

LU 

♦ Tubulin 









Figure 1. Mouse Endometriotic Tissues 
Have Elevated Levels of ERp 

(A and B) The expression levels of ER(3, ERa, PR, 
and tubulin in the uteri of sham-treated C57BL/6J 
mice and the eutopic endometria (A) and ectopic 
lesions (B) of C57BL/6J mice with endometriosis. 
(C) IHC and quantitative analyses of ERp levels in 
the uteri of sham-treated C57BL/6J mice and 
ectopic and eutopic endometria of C57BL/6J mice 
with endometriosis. 

In all panels, error bars represent ± SD. 



observed SRC-1 coactivator isoform, 
these two drivers of endometriotic dis- 
ease cooperate to render endometriosis 
a therapeutically complex disease. 
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estrogen-regulated growth-inhibitor gene expression in an 
estrogen-dependent manner to enhance the proliferative ac- 
tivity of endometriotic tissues (Monsivais et al., 2014). In addi- 
tion, ERp directly binds to the ERa promoter region to repress 
ERa gene expression, which can lead to a state of progester- 
one resistance in the endometriotic tissues by suppressing 
ERa-mediated progesterone receptor (PR) expression in en- 
dometriotic tissues (Bulun et al., 2012). However, we believe 
the complete repertoire of ERp functions to be more compli- 
cated because greatly elevated levels of ERp exist in both 
nuclear and cytoplasmic locations in endometriotic tissues 
(Cheng et al., 2011). We believed a more detailed investigation 
should be carried out to fully understand the mechanisms of 
ERp action in endometriosis progression. 

We propose a cytoplasmic ERB protein network that, in addi- 
tion to its genomic functions, promotes endometriosis patho- 
genesis in a non-genomic manner. Together with our previously 



Mouse Endometriotic Tissues Have 
Elevated ERp Levels Similar to 
Those in Human Endometriotic 
Cells 

Human endometriotic cells isolated from 
endometriosis patients have higher levels 
of ERp, but not ERa, than do normal hu- 
man endometrial cells (Han et al., 2012). 
Consistent with human endometriotic 
cells, both eutopic and ectopic endome- 
tria from mice with endometriosis also 
had markedly higher ERp levels compared 
with the uteri of sham-treated mice (Fig- 
ures 1 A and 1 B). In contrast to ERp, how- 
ever, the levels of ERa did not differ in 
eutopic endometria but were reduced in 
ectopic lesions compared with sham- 
treated uteri (Figures 1A and IB). Levels 
of PR were reduced in both ectopic le- 
sions and eutopic endometria of mice 
with endometriosis compared with the 
uteri of sham-treated mice (Figures 1A 
and IB). Immunohistochemistry (IHC) using an ERp antibody 
(validation of its specificity in Figure 4B) revealed elevated ERp 
levels in the epithelial and stromal compartments of both ectopic 
lesions and eutopic endometria compared with those compart- 
ments in normal endometrium (Figure 1C). Therefore, the ERp 
levels are elevated in endometriotic tissues of mice with endo- 
metriosis, similar to the levels observed in human endometriotic 
cells. 



Endometriotic Tissues Have Enhanced ERp Activity 
Compared with Normal Endometria 

To determine ERp activity in endometriotic tissues in vivo, we 
generated an ERp activity indicator (ERBAI) mouse containing 
a modified ERp bacterial artificial chromosome clone that 
has a Gal4 DNA-binding domain (DBD) instead of its own DBD 
and a hrGFP reporter controlled by the Gal4-upstream activating 
sequence (UAS) according to our prior protocol (Han et al., 2009) 
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Figure 2. Enhanced ERp Activity Is Detected 
in the Endometriotic Tissues of Mice with 
Endometriosis Compared to Normal Endo- 
metria 

(A) Generation of a modified ERp bacteriai artificiai 
cione that has a Gai4 DNA-binding domain and 
the Gai4-UAS-hrGFP reporter. DY380, bacteriai 
recombination strain; Kan^, kanamycin-resistant 
gene; DBD, DNA-binding domain; Gai4-UAS, 
Gai4-upstream-activating sequence; FLP, flip- 
pase; hrGFP, humanized reniiia GFP. 

(B) iHC anaiyses of hrGFP ieveis in the uteri of 
sham-treated ERBAi mice and ectopic and eutopic 
endometria of ERBAi mice with endometriosis. 

(C and D) The quantification of hrGFP ieveis in the 
epitheiiai (C) and stromai (D) compartments of 
each type of endometrium in (B). 
in aii paneis, error bars represent ± SD. See aiso 
Figure S1. 
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(Figures 2A and S1). Therefore, Gal4-ERp binds to Gal4-UAS, 
transcribing hrGFP gene expression in response to hormone. 
Detailed information regarding the generation and validation of 
the ERBAI mouse model is described in the Supplemental Infor- 
mation (Figure S1). 

To investigate potential alterations in ERp activity in endo- 
metriotic tissues during endometriosis progression, endometri- 
osis was surgically induced using an ERBAI mouse model 
through autotransplantation; to monitor ERp activity, the hrGFP 
levels were determined in ectopic and eutopic endometria of 
ERBAI mice with endometriosis and in the uteri of sham-treated 
ERBAI mice. Elevated hrGFP levels were detected in epithelial 
and stromal cells of ectopic and eutopic endometria compared 
with those found in the normal uteri of sham-treated ERBAI 
mice (Figures 2B- 2D). Therefore, enhanced ERp activities 
were detected in the stromal and epithelial compartments of en- 
dometriotic tissues compared with normal endometria. 



Enhanced ERp Activity Is Required 
for Ectopic Lesion Growth in Mice 
with Endometriosis 

Although enhanced ERp activity was 
detected in endometriotic tissues, it was 
not clear whether the enhanced ERp ac- 
tivity was required for ectopic lesion 
growth. Therefore, PHTPP, an ERp-selec- 
tive antagonist (Compton et al., 2004), 
was employed to address it. Ectopic 
lesions were surgically developed in 
ovariectomized C57BL/6J mice con- 
taining an Estradiol (E2) pellet. On the 
21®^ day after endometriosis induction, 
PHTPP or vehicle was administered 
to endometriosis-induced mice (Figure 
S2A). Compared with vehicle treatment, 
PHTPP treatment significantly sup- 
pressed ectopic lesion growth in mice 
with endometriosis (Figure 3A). For the 
stimulation of ectopic lesion growth, en- 
dometriotic tissues recruit immune cells 
(CD1 63-positive monocyte/macrophage cells) to enhance im- 
mune cell-mediated cytokine signaling (Figure S2B, arrowhead). 
However, PHTPP-treated ectopic lesions did not recruit immune 
cells compared with vehicle-treated ectopic lesions (Figure S2B). 
In addition, PHTPP treatment clearly diminished ERp activity in 
the epithelial and stromal compartments of ectopic lesions and 
the eutopic endometrium of mice with endometriosis compared 
with vehicle (Figures 3B and 3C). Collectively, PHTPP inhibits 
ERp activity, which leads to endometriotic lesion growth in 
mice with endometriosis. 

Anti-apoptosis signaling and the acceleration of proliferation 
are typical molecular properties associated with the survival of 
endometriotic tissues (Pellegrini et al., 2012; Salmassi et al., 
2011). Because endometriotic tissues consist of epithelial and 
stromal compartments, signaling communication between these 
compartments plays an essential role in endometriotic lesion 
progression (Kim et al., 2013). Therefore, functional defects 
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involving hyperproliferation and anti-apoptosis signaling in either 
compartment in endometriotic tissues should impair cellular 
processes in the other compartment, ultimately leading to the 
suppression of ectopic lesion growth. PHTPP treatment, 
compared with vehicle treatment, reduced proliferative activity 
as determined by Ki-67 in the epithelial, but not stromal, com- 
partments of ectopic lesions in C57BL/6J mice with endometri- 
osis (Figure 3D). In the case of apoptosis signaling as determined 
by cleaved caspase 8 levels, PHTPP treatment significantly 
enhanced apoptotic signaling in both the epithelial and stromal 
compartments of ectopic lesions of C57BL/6J mice with endo- 
metriosis compared with vehicle treatment (Figure 3E). In addi- 
tion to ectopic lesions, PHTPP suppressed proliferation and 
anti-apoptosis signaling in the epithelium and inhibited prolifera- 
tion in stromal cells in the eutopic endometria of mice with endo- 
metriosis (Figures 3F and 3G). 

In addition to ERp, PHTPP can inhibit ERa activity in vivo, 
although its effects on ERa are minimal (Compton et al., 2004). 
To address this issue, the levels of mouse uterine ERa direct 
target genes (such as PR, CDKN1 A, and ERRFI1) were examined 
in ovariectomized mice upon E2 and/or PHTPP treatment. 
PHTPP partly reduced the expression of direct ERa target genes 
stimulated by E2 (Figures S2C-S2E). Interestingly, a female 
mouse fertility assay revealed that PHTPP did not reduce repro- 
ductive activity in female mice, whereas an ERa-selective antag- 
onist, MPP dihydrochloride, significantly reduced the fertility of 
female mice compared with vehicle treatment (Figure S2F). 
Therefore, PHTPP does not disrupt the fertility of female mice, 
though it partly suppresses uterine ERa activity. In contrast to 
the effects of PHTPP, ERB-041 (ERp-specific agonist) treatment 
enhanced the mouse ectopic lesion growth compared with 
vehicle (Figure S2G). 

To address the effects of ERB-041 and PHTPP in human 
endometriotic lesion growth, we employed two types of human 
endometrial cells: EMosis-CC/TERT cells, which are immortal- 
ized human endometriotic epithelial cells isolated from ovarian 
endometriomas (Bono et al., 2012), and immortalized human 
endometrial stromal cells (iHESCs) (Krikun et al., 2004). For 
simplification, EMosis-CC/TERT cells are called immortalized 
human endometriotic epithelial cells (iHEECs) hereafter. For 
non-invasive bioluminescence imaging analyses of ectopic 
lesions in SCID mice, luciferase reporters expressing iHEECs 
(iHEECs/Luc) and iHESCs (iHESCs/Luc) were generated using a 
lentiviral expression system. To induce endometriosis, a mixture 
of epithelial and stromal cells (iHEECs/Luc plus iHESCs/Luc) 
was injected into ovariectomized SCID mice with an E2 pellet. 
On the 21®* day after endometriosis induction, endometriosis- 
induced SCID mice were treated with ERB-041 or PHTPP for 



another 21 days (Figure S2A). Ectopic lesion image analyses re- 
vealed that ERB-041 treatment stimulated human ectopic lesion 
growth, whereas PHTPP treatment decreased ectopic lesion 
growth in SCID mice (Figure S2H). Moreover, ERBAI mice with 
endometriosis also revealed that ERB-041 enhanced ERp activ- 
ity in ectopic lesions compared with vehicle treatment (Fig- 
ure S2I). Collectively, enhanced ERp activity is required for the 
pathogenesis of endometriosis (Table 1). 

Loss of ERp Function Suppresses Ectopic Lesion 
Growth in Mice with Endometriosis 

To directly investigate the loss of ERp function in the pathogen- 
esis of endometriosis, endometriosis was surgically induced via 
the auto-transplantation of uterine tissue using ERp“^“ (Krege 
et al., 1998) and wild-type (WT) mice. The sizes of the ERp“^“ 
ectopic lesions were reduced significantly compared with WT 
ectopic lesions (Figure 4A). IHC using an ERp antibody (Saji 
et al., 2000) validated the fact that ERP“^“ ectopic lesions did 
not exhibit ERp expression compared with WT ectopic lesions 
(Figure 4B). 

To investigate how loss of the ERp function impacts ectopic 
lesion progression, cell proliferation and apoptotic signals in 
each type of ectopic lesion were examined. The reduced levels 
of epithelial, but not stromal, proliferation were detected in 
ERp“^“ ectopic lesions compared with WT ectopic lesions (Fig- 
ure 4C and Table 1). In contrast to proliferation, however, loss of 
ERp functions significantly elevated epithelial, but not stromal, 
apoptosis in ERp“^“ ectopic lesions (Figure 4D and Table 1). 

Regarding eutopic endometrium, loss of ERp function did not 
impair the proliferation of ERP“^“ eutopic endometria compared 
with WT eutopic endometria during endometriosis progression 
(Figure 4E and Table 1). However, apoptosis signaling was 
elevated in both compartments of the eutopic endometrium in 
the absence of the ERp gene compared with WT eutopic endo- 
metrium (Figure 4F and Table 1). 

Gain of ERp Function Stimulates Ectopic Lesion Growth 
in Mice with Endometriosis 

To mimic ERp elevation in human and mouse endometriotic tis- 
sues, an endometrium-specific ERp-overexpressing mouse 
model was generated and validated (Figure S3). For simplifica- 
tion, R0SA‘"^‘“'^^^^'^ monogenic mice, which do not express 
exogenous ERp, and endometrium-specific ERp-overexpress- 
ing (ROSA‘-^‘--^^p^'":PR^^^^'") bigenic mice are hereafter referred 
to as control and ERp:OE mice, respectively (Figure S3A). 

To determine whether endometrium-specific ERp overexpres- 
sion impacts ectopic lesion growth, endometriosis was surgi- 
cally induced by auto-transplantation using ovariectomized 



Figure 3. ERp-Specific Antagonist Regresses Ectopic Lesion Growth 

(A) Ectopic lesions isolated from C57BL/6J mice with endometriosis subcutaneously treated with vehicle or PHTPP. 

(B and C) IHC and quantitative analyses of hrGFP levels in ectopic lesions (B) and eutopic endometria (C) of ERBAI mice with endometriosis subcutaneously 
treated with vehicle or PHTPP. 

(D and E) IHC and quantitative analyses of the expression patterns of Ki-67 (D) and cleaved CSP8 (E) in ectopic lesions of C57BL/6J mice with endometriosis 
subcutaneously treated with vehicle or PHTPP. 

(F and G) IHC and quantitative analyses of the levels of Ki-67 (F) and cleaved CSP8 (G) in the eutopic endometria of C57BL/6J mice with endometriosis sub- 
cutaneously treated with vehicle or PHTPP. PLC, percentage of labeled cells; CSP8, caspase 8. 

In all panels, error bars represent ± SD. See also Figure S2. 
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Table 1. Proliferation and Apoptosis in Ectopic Lesions and 
Eutopic Endometria of PHTPP-Treated, ERp~^~, and ER^iOE Mice 
with Endometriosis 



Cellular 

Process 


Type of 
Endometrium 


Compartment PHTPP 


ERB'^' 


ERB:OE 


Proliferative 


ectopic 


epithelium 


- 


- 


+ 


activity 


lesions 


stromal 


0 


0 


+ 




eutopic 


epithelium 


- 


0 


+ 




endometrium 


stromal 


- 


0 


+ 


Apoptosis 


ectopic 


epithelium 


+ 


+ 


- 


signaling 


lesions 


stromal 


+ 


0 


- 




eutopic 


epithelium 


+ 


+ 


0 




endometrium 


Stromal 


0 


+ 


0 


Ectopic lesion volume 




- 


- 


+ 



+: increased, 0: no change, decreased compared to vehicle treatment, 
WT, or control mice. 



control and ERp:OE mice containing E2 pellets. ERp:OE ectopic 
lesions had much larger volumes than did control ectopic lesions 
in mice with endometriosis (Figure 5A). Exogenous Flag/Myc- 
tagged ERp expression and elevated levels of total ERp were 
determined in ERp:OE ectopic lesions compared with control 
ectopic lesions (Figures 5B and S3). Thus, elevated ERp levels 
in ectopic lesions enhanced ectopic lesion growth. The overex- 
pression of nuclear receptors can induce ligand-independent 
effects (Weigel and Zhang, 1998). However, neither control nor 
ERp:OE endometrial tissue fragments successfully developed 
into ectopic lesions in ovariectomized mice without the admin- 
istration of an E2 pellet (Figure S3F). Therefore, gain-of-ERp- 
function-mediated stimulation of ectopic lesion growth is an 
estrogen-dependent process. 

The proliferative activity was significantly elevated in both the 
epithelial and stromal compartments of ERp:OE ectopic lesions 
compared with control ectopic lesions (Figure 5C and Table 1). 
However, apoptosis was significantly reduced in both the epithe- 
lial and stromal compartments of ERp:OE ectopic lesions 
compared with control lesions (Figure 5D and Table 1). 

In addition to ectopic lesions, both the epithelial and stromal 
compartments of the ERp:OE eutopic endometrium demon- 
strated enhanced proliferative activity compared with control 
eutopic endometrium (Figure 5E and Table 1). However, no alter- 
ation in apoptosis signaling was detected in ERp:OE eutopic 
endometrium compared with control eutopic endometrium (Fig- 
ure 5F and Table 1). Thus, the eutopic endometria of endometri- 
osis patients appear primarily to be in a hyperproliferative state 
due to elevated ERp levels. Notably, breeding trials designed 
to assess mating success revealed that ERp:OE mice were 
infertile compared with control mice (Figure S3G). Moreover, 
ERp-overexpressing iHESCs lose their decidualization response 
because, as compared with parental iHESCs, the induction of 
decidual cell marker genes, such as insulin-like growth factor- 
binding protein 1 and prolactin, was significantly reduced upon 
estradiol-medroxyprogesterone-cAMP treatment (Figures SC- 
SI). Therefore, endometriosis-associated ERp overexpression 
in eutopic endometrium might impair the decidualization pro- 
cess in women with endometriosis, leading to infertility. 



ERp Interacts with the Cytoplasmic Apoptosis and 
Inflammasome Machinery in Ectopic Lesions to 
Enhance Ectopic Lesion Survival 

To further dissect the molecular mechanisms of ERp in endo- 
metriosis progression. Flag-tagged ERp-containing protein 
complexes were immunoprecipitated (IPed) from the eutopic 
endometria of ERp:OE mice with endometriosis using a Flag anti- 
body. In IP/Mass analyses, a primary consideration is to sepa- 
rate out proteins that non-specifically interact with beads from 
the list of proteins that are associated with the target protein. 
For this purpose, we employed control mice that had the same 
genetic background as ERp:OE mice and had extremely low 
levels of Flag-tagged exogenous ERp compared with ERp:OE 
mice (Figure 6A). Therefore, proteins co-IPed with the Flag anti- 
body from endometriotic tissues of control mice are considered 
as non-specific bead-binding proteins. To specifically identify 
ERp-interacting proteins, proteins IPed from the eutopic endo- 
metria of control mice were removed from the proteins that 
IPed from the eutopic endometria of ERp:OE mice. Gene 
Ontology analyses with endometriotic tissue-specific ERp-inter- 
acting proteins revealed that large numbers of proteins involved 
in inflammation and apoptosis signaling were specifically 
co-IPed with ERp from ERp:OE eutopic endometrium (Figures 
S4A and S4B). To validate these interactions in ectopic lesions, 
the ERp complex was isolated from control and ERp:OE ectopic 
lesions using a Flag antibody, and ERp-interacting proteins 
were analyzed further by western blot analyses (Figure 6A). 
Western blot analyses revealed that only a very weak Flag-ERp 
signal was detected in control ectopic lesions that developed 
in ROSA'-^'"'^^^^'^ monogenic mice; this small amount is likely 
attributable to the leaky expression of exogenous ERp in the 
ROSA’-^’-'^^p^'" monogenic mouse (Figure 6A). 

Apoptosis signal-regulating kinase 1 (ASK-1) was found to 
interact prominently with ERp in ectopic lesions (Figure 6A). 
ASK-1 is a component of tumor necrosis factor alpha (TNF-a)- 
induced apoptosis complex I, and its activation is required for 
TNF-a-induced apoptosis in multiple cell types (Tobiume et al., 

2001) . In addition to ASK-1, serine/threonine kinase receptor- 
associated protein (STRAP) and 14-3-3 were also specifically 
co-IPed with ERp from the ERp:OE ectopic lesions, but not 
from control ectopic lesions (Figure 6A). To prevent TNF- 
a-induced apoptosis, STRAP and 14-3-3 proteins interact with 
ASK-1 to disrupt associations between TNF receptor-associ- 
ated factor 2 (TRAF2) and ASK-1 upon TNF-a stimulation (Hatai 
et al., 2000). These data imply that ERp may induce ASK-1/ 
STRAP/14-3-3 complex formation to prevent the activation of 
TNF-a/ASK-1 -mediated apoptosis in endometriotic tissues. To 
validate this hypothesis, the levels of ASK-1 phosphorylation at 
Thr845 were determined for each type of ectopic lesion because 
ASK-1 phosphorylation at Thr845 is associated with ASK-1 acti- 
vation to enhance TNF-a-induced apoptosis (Tobiume et al., 

2002) . The phospho-Thr845 ASK-1 levels were significantly 
reduced in ERp:OE ectopic lesions compared with control 
ectopic lesions without alternation of total ASK-1 levels (Figures 
6B-6D). In contrast, the levels of total ASK-1 and phospho- 
ASK-1 were significantly elevated in ERp“'"“ ectopic lesions 
compared with WT ectopic lesions (Figures S5A- S5C). Collec- 
tively, ERp induced ASK-1 /STRAP/1 4-3-3 complex formation 
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to prevent ASK-1 activation in ectopic lesions, thereby pro- 
moting ectopic lesion survival. TNF-a-induced ASK-1 activation 
also increases mitochondrial cytochrome c levels to activate 
caspase 9 (Hatai et al., 2000). The cytochrome c levels in ERp:OE 
ectopic lesions were significantly lower than those in control 
ectopic lesions (Figure 6E). Therefore, the gain of ERp function 
prevented the TNF-a/ASK-1 /cytochrome c signaling pathway 
in ectopic lesions to promote lesion survival. 

After the initiation of TNF-a-induced apoptosis by apoptosis 
complex I, the tumor necrosis factor receptor (TNFR) 1 -associ- 
ated death domain (TRADD) protein, a component of complex I, 
is shuttled from TNFR to the cytoplasm and then interacts 
with the Fas-associated via death domain (FADD) protein and 
caspase 8 to generate apoptosis complex II to amplify TNF- 
a-induced apoptosis (Micheau and Tschopp, 2003). The endo- 
metriotic 70 kDa SRC-1 isoform also interacts with caspase 8 
to inhibit caspase 8 activation in ectopic lesions to promote their 
survival (Han et al., 2012). Interestingly, we also found that ERp 
interacted with caspase 8 and this SRC-1 isoform in ectopic 
lesions (Figures 6A and S4C). Moreover, ERp:OE ectopic lesions 
contained significantly reduced levels of cleaved caspase 8 
compared with control ectopic lesions (Figure 5D). Therefore, 
we suggest that ER|3 also interacts with caspase 8 along with 
the SRC-1 isoform; this combined interaction strongly inhibits 
caspase 8 activation in ectopic lesions to effectively prevent 
activation of TNF-a-induced apoptosis complex II in ectopic 
lesions. However, this SRC-1 isoform/ERp/caspase 8 complex 
did not interact with components of TNF-a-induced apoptosis 
complex I and the apoptosome in ectopic lesions (Figure S4D). 

To validate synergism between the SRC-1 isoform and ERp for 
the progression of endometriosis, a combination of Gossypol 
that reduces the transcriptional activity and stability of SRC-1 
(Wang et al., 2011) and PHTPP was employed to suppress 
ectopic lesion growth in mice with endometriosis. This combina- 
tion of Gossypol and PHTPP treatment significantly reduced 
ectopic lesion growth compared with individual treatments (Fig- 
ures 6F and 6G). Therefore, cooperative interactions between 
the ERp and SRC-1 isoforms effectively appear to drive the path- 
ogenesis of endometriosis. 

The cytochrome c effectively induces the formation of the 
apoptosome, which consists of caspase 9 and apoptotic pepti- 
dase-activating factor 1 (APAF-1), to activate caspase 9 (Bratton 
and Salvesen, 2010). In ERp:OE ectopic lesions, the interaction 
of caspase 9 and APAF-1 was not detected (Figures 6H and 
S5D). The cleaved caspase 9 levels were significantly reduced 
in ERp:OE ectopic lesions compared with those in control 
ectopic lesions (Figure 61). In ERp“^“ ectopic lesions, however, 
the interaction of caspase 9 and APAF-1 was detected (Fig- 
ure S5D). These data suggest that ERp prevented TNF- 
a-induced apoptosome formation in endometriotic cells by 
disrupting the interaction of caspase 9 and APAF-1 through a 



competitive ERp interaction with caspase 9. Collectively, ERp 
synergistically inhibited the activation of apoptosis complex I 
and complex II and apoptosome formation in ectopic lesions 
to effectively prevent TNF-a-induced apoptosis in endometriotic 
tissues for ectopic lesions survival. 

Caspase 1 and the NLR family pyrin domain-containing 3 
(NALP3) were also co-IPed with ERp from ERp:OE ectopic 
lesions (Figures 6H and S4B). Both caspase 1 and NALP3 are 
components of the inflammasome, which is involved in the matu- 
ration of IL-1 p formation from pro-IL-1 p (Willingham et al., 2009). 
Interestingly, the NALP3-mediated inflammasome has an essen- 
tial role in endometriosis progression because NALP3“^“ 
ectopic lesion volume was significantly reduced compared 
with WT ectopic lesions of mice with endometriosis (Figure 6J). 
IL-lp is a key cytokine involved in both the adhesion and prolif- 
eration of endometrial cells (Cao et al., 2005; Sillem et al., 1999). 
ERp:OE ectopic lesions had higher IL-lp levels than control 
ectopic lesions because cleaved caspase 1 levels were highly 
elevated in ERp:OE ectopic lesions compared with controls 
(Figures 6K and 6L). However, cleaved caspase 1 and IL-ip 
levels were reduced in ERp“^“ ectopic lesions compared with 
WT lesions of mice with endometriosis (Figure S5A). Therefore, 
the combinational interactions of ERp with caspase 1 and 
NALP3, the activation of caspase 1, and the elevation of IL-lp 
levels in ectopic lesions supported our conclusion that ERp 
also enhances inflammasome activity in ectopic lesions for their 
survival. 

This Flag-ERp-interacting protein network may not accurately 
recapitulate the endogenous ERp-interacting protein network in 
endometriotic tissues because this network was generated by 
overexpressed exogenous ERp. To further address this issue, 
ERp complexes were isolated from ectopic lesions of C57BL/ 
6J mice with endometriosis because these ectopic lesions had 
elevated endogenous ERp levels (Figure IB). The SRC-1 iso- 
form, caspase 8, caspase 9, caspase 1 , and ASK-1 were also 
co-IPed with endogenous ERp from ectopic lesions similar 
to exogenous Flag-ERp, though the IP efficiency of ERp anti- 
body (SC-8794, Santa Cruz) is low (Figure S4E). Therefore, we 
concluded that the overexpressed ERp complex is similar to 
the endogenous ERp complex in endometriotic tissues. 

Gain of ERp Function Prevents TNF-a-induced 
Apoptosis and Enhances Proliferation, Invasion, and 
Adhesion Activities of Immortalized Human 
Endometriotic Epithelial Cells 

To investigate the functions of ERp and the ERp/SRC-1 isoform 
complex, iHEECs stably expressing ERp (iHEECs/ERp) and the 
SRC-1 isoform (iHEECs/SRC-1 Iso) were generated separately 
and together (iHEECs/ERp/SRC-1 Iso) (Figure 7A, in bottom). 
TNF-a treatment increased the levels of cleaved caspase 8 
and cleaved caspase 3 in control iHEECs compared with vehicle 



Figure 4. The Loss of ERp Function Prevents Ectopic Lesion Growth 

(A) Ectopic lesions isolated from C57BL/6J (WT) and mice with endometriosis. 

(B) IHC analyses and quantification of the ERp levels in ectopic lesions isolated from WT and ERP“''“ mice with endometriosis. 

(C-F) IHC and quantitative analyses of Ki-67 (C and E) and cleaved CSP8 (D and F) in the epithelial and stromal compartments of ectopic lesions (C and D) and 
eutopic endometria (E and F) of WT and ERP“''“ mice with endometriosis. 

In all panels, error bars represent ± SD. 
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Figure 5. The Gain of ERp Function Stimu- 
lates Ectopic Lesion Growth 

(A) Ectopic lesions isolated from control and 
ERp:OE mice with endometriosis. 

(B) Exogenous Flag/Myc-tagged human ERp (F/M- 
hERp) protein levels in the eutopic endometria of 
control and ERp:OE mice with endometriosis. 
mERp, endogenous mouse ERp. 

(C-F) IHC and quantitative analyses of Ki-67 (C and 
E) and cleaved CSP8 (D and F) in the epithelial and 
stromal compartments of ectopic lesions (C and D) 
and eutopic endometria (E and F) of control and 
ERp:OE mice with endometriosis. Higher magnifi- 
cation views of the boxed regions are shown. 

(G) Exogenous Myc -tagged human ERp (Myc- 
hERp) protein levels in IHESCs/ERp as determined 
with a Myc antibody. 

(H and I) The quantification of relative changes in 
the mRNA levels of decidualization marker genes, 
IGFBP1 (H) and PRL (I), in IHESCs (Control) and 
IHESCs/ERp (ERpOE) upon estrogen/medrox- 
yprogesterone/d b-cAMP (ECP) treatment on the 
indicated day. 

In all panels, error bars represent + SD. See also 
Figures S3. 
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Figure 6. ERp Interacts with TNF-a-Induced Apoptosis Complexes and the Inflammasome in Endometriotic Tissues of Mice with 
Endometriosis 

(A) Flag-ERp complexes IPed with a Flag antibody from ectopic lesions of control and ERp:OE mice with endometriosis followed by western blotting (WB) with 
antibodies against ASK- 1 , STRAP, 14-3-3, CSP8, SRC-1, Flag, and tubulin. 

(B and C) IHC and quantitative analyses of phospho-Thr845-ASK-1 (P-ASK-1) (B) and total ASK-1 (C) in control and ERp:OE ectopic lesions. 

(D) Western blot analyses of phospho-Thr845-ASK-1 (P-ASK-1), total ASK-1, ERp, and tubulin in control and ERp:OE ectopic lesions. 

(E) IHC and quantitative analyses of cytochrome c levels in control and ERp:OE ectopic lesions. 

(F and G) Regression of ectopic lesion growth in endometriosis-induced C57BL/6J mice subcutaneously treated with Gossypol, PHTPP, or their combination 
compared to vehicle (F). Quantification of ectopic lesion volume in (F) is shown in the graph (G). 

(H) The IPed Flag-ERp complex from ERp:OE ectopic lesions with a Flag antibody or IgG followed by western blotting with antibodies against Flag, CSP 9, 
APAF-1 , CSP1 , and NALP3. *, non-specific protein. 

(I) IHC and quantitative analyses of cleaved CSP9 levels in control and ERp:OE ectopic lesions. Higher-magnification views of the boxed regions are shown. 

(J) Ectopic lesions isolated from C57BL/6J (WT) and NALP3“^“ mice with endometriosis. 

(legend continued on next page) 
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(Figure 7A). However, TNF-a treatment did not enhance the 
levels of the above apoptosis markers in iHEECs/ERp, in 
iHEECs/SRC-1lso, or in the combined iHEECs/SRC-1lso/ERp 
(Figure 7A). Therefore, ERp and ERp/SRC-1 isoform complex 
effectively prevented TNF-a-induced apoptosis. The gain of 
ERp, but not the SRC-1 isoform, function elevated the IL-1p 
levels in iHEECs in the presence or absence of TNF-a treatment 
(Figure 7A). These data were also generated via the artificial 
overexpression of ERp. To support that gain of ERp function is 
not artificial, primary human endometriotic stromal cells isolated 
from human endometriosis patients were employed because 
these cells also had elevated levels of endogenous ERp 
compared with normal human endometrial stromal cells (Fig- 
ure S4F). Primary human endometriotic stromal cells also have 
elevated levels of IL-1B and anti-apoptosis signaling upon 
TNF-a treatment compared with normal endometrial stromal 
cells (Figure S4G). Therefore, ERp plays a critical role in anti- 
apoptosis signaling and inflammasome activation in ectopic 
lesions. In addition, ERp enhanced the cell adhesion and prolif- 
erative activities of iHEECs compared with control cells in the 
presence of TNF-a (Figures 7A and 7B). Therefore, it is likely 
that the increased IL-1 p observed with elevated ERp induces 
adhesion and proliferation activities of endometrial tissue frag- 
ments in the peritoneal area of endometriosis patients to initiate 
ectopic lesion development. In addition to IL-lp, the levels 
of several cytokines (such as MIP-2, IL-1 6, MIP-la, MCP-5, 
TREM-1, and BLC) were significantly elevated in ERB:OE 
ectopic lesions compared with control ectopic lesions (Figures 
S6A and S6C). Previous studies also revealed that levels of these 
cytokines are elevated in the peritoneal fluid of women with 
endometriosis (Ahn et al., 2015). In contrast, some cytokine 
levels (MIG, M-CSF, TNF-a, KC, and IP-10) were reduced in 
ERB:OE ectopic lesions compared to control ectopic lesions 
(Figures S6A and SOB). Collectively, the gain of ERp function 
broadly alters the cytokine milieu in ectopic lesions in concert 
with promotion of endometriotic lesion growth. Consistent with 
SRC-1 isoform, ERp also increased the expression of EMT 
markers, such as Slug and Snail, and enhanced invasion activity 
in iHEECs (Figures 7A and 7C). Therefore, the increased EMT 
and invasion activity of ectopic lesions again occurs through 
the effective cooperation of the ERp and SRC-1 isoforms. How- 
ever, vascular endothelial growth factor (VEGF) levels were not 
changed in ERp:OE ectopic lesions and iHEECs/ERp compared 
with their controls (Figures S6D and S6E). Therefore, the angio- 
genesis of ectopic lesions is not regulated by ERp. 

To further support the gain of ERp function in human 
ectopic lesion development, iHEECs/Luc, iHEECs/ERp/Luc, 
and iHESCs/Luc were employed for non-invasive biolumines- 
cence imaging analysis of ectopic lesion growth in SCID mice. 
To induce endometriosis, a mixture of iHESCs/Luc plus 
iHEECs/ER p/Luc was injected into ovariectomized SCID mice 
with an E2 pellet. For controls, a mixture of iHESCs/Luc and 
iHEECs/Luc was injected. Bioluminescence image analysis on 



injection day 0 revealed that similar amounts of human endome- 
trial cells for each group were injected into recipient SCID 
mice (Figure 7D). Comparative bioluminescent analysis on the 
21®^ day after injection revealed that human ectopic lesions 
with ERp overexpression exhibited stronger bioluminescent 
activity compared with control ectopic lesions (Figure 7E). There- 
fore, ERp enhanced the in vivo survival rate of human endome- 
triotic cells and promoted their development into human ectopic 
lesions in SCID mice. 

In addition to ERp, ERa plays an essential role in the pathogen- 
esis of endometriosis in the mouse model (Burns et al., 201 2). To 
determine the functional difference between ERa and ERp in 
endometriosis progression, iHEECs expressing Myc-tagged hu- 
man ERa genes (iHEECs/ERa) were generated (Figure S7A). In 
contrast to iHEECs/ERp, however, gain of ERa function did not 
prevent TNF-a-induced apoptosis signaling and did not induce 
IL-lp expression, proliferative activity, or expression of EMT 
markers (Slug and Snail) and VEGF in iHEECs/ERa compared 
with parental iHEECs upon TNF-a treatment (Figure S7A). 

Unlike ERp, therefore, ERa is not involved directly in the 
evasion of immune surveillance or in the invasion and IL-1 p-me- 
diated proliferation of ectopic lesions. For the combination of 
ERa and ERp, ERa did not interfere with ERp-mediated anti- 
apoptotic activity in iHEECs upon TNF-a treatment (Figure S7B). 
In fact, ERa inhibited ERp-mediated IL-lp production (Fig- 
ure S7B). Therefore, ERa might be involved in the negative regu- 
lation of ERp-mediated inflammasome activation. 

Taken together, the gain of ERp and SRC-1 isoform function in 
endometrial fragments generated by retrograde menstruation 
prevents TNF-a-induced apoptosis complex activity to evade 
immune surveillance. After evasion, ERp interacts with the in- 
flammasome complex to induce IL-1 p in endometrial fragments 
that have evaded immune surveillance to facilitate attachment to 
and growth at target sites (Figure 7F). In addition, ERp also in- 
duces EMT and invasion activity in cooperation with the SRC-1 
isoform to establish endometriotic lesions (Figure 7F). 

DISCUSSION 

ERp Has Non-genomic Action for Anti-apoptosis and 
Inflammasome Activation 

The physiological effects of estrogen are mediated by estradiol 
binding to one of the ER isoforms, ERa and ERp. Estrogen- 
liganded ER isoform then binds to specific DNA sequences 
called estrogen response elements. Interestingly, phenotype 
analyses of ERa“^“, ERp“^“, ERa“^“:ERp“^“ bigenic mouse 
models have revealed that ER isoforms have overlapping but 
also unique roles in estrogen-dependent action in vivo (Walker 
and Korach, 2004). For their unique function, ERa and ERp 
have different transcriptional activities in certain ligand, cell- 
type, and promoter contexts. In the case of endometriotic tis- 
sues, both ER isoforms are expressed in endometriotic tissues 
and required for endometriotic lesion growth. The gain of ERp 



(K) IHC and quantitative anaiyses of iL-ip ieveis in controi and ERp:OE ectopic iesions. 

(L) Western blot analyses of levels of IL-1 p, CSP1 , Flag-tagged ERp, and tubulin (as a protein loading control) in ectopic lesions of control and ERp:OE mice with 
surgically induced endometriosis. 

In all panels, error bars represent ± SD. See also Figures S4 and S5. 
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Figure 7. Gain of ERp Function Prevents TNF- 
a-lnduced Apoptosis Signaling but Stimulates Pro- 
liferation, Adhesion, and Invasion Activities of Human 
Endometriotic Cells 

(A) Levels of cleaved CSP8, cleaved CSP3, IL-1|3, KI67, Slug, 
Snail, and SRC-1 isoform (determined by a Flag antibody); ERp 
(determined using a Myc antibody); and tubulin in iHEECs 
(Control), iHEECs/SRC-llso (SRC-1 ISO), iHEECs/ERp (ERp), 
or iHEECs/SRC-llso/ERp (SRC-1 ISO+ERp) upon 50 ng/ml 
TNF-a plus 10 |ag/ml cycloheximide treatment for 0 and 8 hr. 

(B) Cell-adhesion activities of paternal iHEECs (Control) and 
iHEECs/ERp (ERp) against various extracellular matrices in the 
presence of 50 ng/ml TNF-a. 

(C) Invasion activities of iHEECs (Control) and iHEECs/ERp 
(ERp) for 2 days using a Transwell plate assay. The amounts of 
invasive cells in each group were determined using a crystal 
violet staining protocol and are shown in the graph. 

(D and E) Bioluminescence and quantitative analyses of 
iHEECs/Luc (Control) and iHEECs/ERp/Luc (ERp) in SCID mice 
at 0 (D) and 21 (E) days after the induction of endometriosis. 
(F) Working model for the non-genomic action of ERp in 
endometriosis progression. 

In all panels, error bars represent ± SD. See also Figures S6 
and S7. 
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function study revealed that ERp prevents apoptosis singling 
and enhances adhesion, invasion, proliferation, inflammasome 
activity, and inflammation signaling in ectopic lesions for their 
growth. The study with mice with endometriosis re- 

vealed that ERa drives proliferation, adhesion, and angiogenesis 
and also modulates inflammation signaling in ectopic lesions 
(Burns et al., 2012). To establish endometriotic lesions, collec- 
tively, both ERa and ERp might synergistically contribute the 
regulation of proliferation, adhesion, and inflammation signaling 
in ectopic lesions. However, ERa mainly drives angiogenesis, 
and ERp has a predominant role in anti-apoptosis and activa- 
tion of inflammasome and invasion in ectopic lesions for their 
survival. 

Based on a retrograde menstruation model for endometriosis 
(Hughesdon, 1958), endometrial tissue and erythrocytes are 
shed through the fallopian tubes into the peritoneal cavity during 
menses. In healthy women, refluxed endometrial fragments that 
appear during retrograde menstruation are cleared by inflam- 
matory mediated cell-death signaling, such as caspase-1 -medi- 
ated pyroptosis (Miao et al., 2011). However, endometriosis 
patients have an immunity that prevents them from clearing 
the refluxed endometrial fragments and then potentiates the 
development and severity of endometriosis. For survival, endo- 
metrial tissue fragments must evade the immune surveillance 
system, particularly peritoneal macrophages (Nasu et al., 
2009). During the early steps of evasion from the immune sur- 
veillance system, ERp generates a cytoplasmic protein network 
to rapidly prevent TNF-p-mediated apoptosis by inactivating 
TNF-a-induced apoptosis complex I and II and the apopto- 
some. We believe that a key synergism exists between ERp 
and the SRC-1 isoform during this evasion of the immune sur- 
veillance system because the SRC-1 isoform also prevents 
TNF-a-induced apoptosis in ectopic lesions. Our combined ob- 
servations lead us to propose that ERp and the SRC-1 isoform 
act cooperatively together to affect a potent anti-apoptotic state 
in endometriotic tissues. 

The formation of the inflammasome and the activity of cas- 
pase 1 determine the balance between pathogen resolution 
and disease pathology. How is the inflammasome involved in 
the pathogenesis of endometriosis? The NALP3 gene has an 
essential role in endometriosis progression because NALP3“^“ 
mice have a defect in ectopic lesion growth under endometri- 
osis. ERp involves in upregulation of the NALP3 inflammasome 
in hepatocellular carcinoma cells upon estrogen stimulation 
even though the interaction of ERp with NALP3 is not clearly 
demonstrated (Wei et al., 2015). Here, we revealed that ERp in- 
teracts with inflammasome components and enhances inflam- 
masome activity through the activation of caspase 1 activity. 
Activation of the inflammasome results in highly elevated IL-1 p 
levels in endometriotic tissues compared with normal endome- 
trium, and enhanced IL-1 p signaling can influence the adhesion 
activity of endometriotic tissues and proliferative activities of 
human endometrial cells. 

The Gain of ERp Function May Lead to Female Infertility 

One of the primary symptoms associated with endometriosis is 
dysfunction of the normal endometrium, leading to endometri- 
osis-associated infertility (Holoch and Lessey, 2010). In addi- 



tion to ectopic lesions, we found that eutopic endometrium 
demonstrated elevated levels of ERp compared with normal 
endometrium. We believe that ERp overexpression could in- 
crease endometriosis-associated infertility by preventing the 
decidualization response in the stromal compartment of eu- 
topic endometrium. Thus, targeting ERp could have dual 
potential benefits in patients with endometriosis: regression 
of ectopic lesion growth and enhancement of fecundity of 
women with endometriosis. 

A Combination Therapy using Antagonists of ERp and 
the SRC-1 Isoform Represents a Proof-of-Principle for 
the Next Generation of Endometriosis Therapy 

Inhibitors of estrogen signaling and estrogen synthesis as well as 
inflammatory inhibitors (COX-2 inhibitors) have been employed, 
given the dependence on estrogen and the inflammatory 
response of ectopic lesions. However, these treatments can be 
associated with undesirable side effects. In addition to substan- 
tiating an infertile state in young women, long-term estrogen 
deficiency therapies can have harmful side effects on other es- 
trogen target tissues, such as the brain and bone (Shah et al., 
1987; Vanderschueren et al., 1997). Therefore, a greater choice 
of alternate therapies that more specifically target endometriotic 
causal modes is needed. 

Our observations proposed that the targeting ERp activity 
should increase the specificity and efficiency of endometriosis 
treatment and could be an alternative combinational approach 
for endometriosis treatment in lieu of current estrogen-defi- 
ciency therapy. As a proof-of-principle, the application of an 
ERp-selective antagonist, such as PHTPP, significantly sup- 
pressed ectopic lesion growth by inhibiting ERp activity in 
ectopic lesions of mice with endometriosis without side effects 
on fertility. The minimal inhibitory effects of PHTPP against uter- 
ine ERa could also be another advantage to minimize side 
effects. We note that a previous study stated that ERB-041 , an 
ERp-specific agonist, caused regression of ectopic lesion 
growth in an endometriosis animal model system (Harris et al., 
2005). The reason for this discrepancy could potentially be 
related to differential ERp expression in ectopic lesions. The 
expression of ERp was not detected in human ectopic lesions 
that developed in athymic nude mice in Harris’ study. PHTPP 
treatment demonstrated certain differential effects in endometri- 
otic tissues compared with ERp knockout tissues. For example, 
proliferation of eutopic endometrium and apoptosis in the stro- 
mal compartment of ectopic and eutopic endometrium were 
differentially regulated between them. This differential regulation 
might be attributable to the differences between pharmacolog- 
ical inhibition and genetic knockout. 

Collectively, we propose that the SRC-1 isoform/ERp com- 
plex could be a next-generation endometriosis therapeutic 
target with reduced side effects compared to current endome- 
triosis treatment because (1) ERp and the SRC-1 isoform show 
endometriotic tissue-specific expression but have little expres- 
sion in normal endometrium; (2) both play an essential role in the 
early stages of endometriosis pathogenesis; and (3) targeting 
both of these drivers allows the marked suppression of ectopic 
lesion growth in animals compared with either individual agent 
alone. 
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EXPERIMENTAL PROCEDURES 
Mouse Information 

Five-week-old normal (C57BL/6J), ERp-^” (B6;129P2-Esr2'^^^"7J), 
NALP3“^“ (B6.129S6-A//rp3^^^®^^ /J), and SCID {NOD. CBM -Prkdcscid/J) 
mice were purchased from Jackson Laboratory. and ERBAI 

mice were generated. The R0SA‘“^‘“-^^'^^'^:PR^''®'''^ mice were generated by 
crossing R0SA‘“^‘“'^^‘^^'^ with PR‘^'‘®''+ mice (Soyal et al., 2005). All animal care 
was controlled by the ethical regulations approved by the Institutional Animal 
Care and Use Committee at Baylor College of Medicine. 

Immortalized Human Endometrial Cells 

IHESCs and EMosis-CC/TERT1 (immortalized human endometriotic epithelial 
cells) were employed and confirmed by short tandem repeat profiling; these 
cells were not contaminated with mycoplasma. 

Surgically Induced Endometriosis 

Endometriosis in mice was surgically induced under aseptic conditions under 
anesthesia. Details on surgically induced endometriosis are found in the Sup- 
plemental Experimental Procedures. 

Generation of ERBAI and ERp:OE Mice 

Details on these mice are found in the Supplemental Experimental Procedures. 

In Vivo Analysis of Human Ectopic Lesion Growth in SCID Mice 

The bioluminescence images of human ectopic lesions developed with 
IHESCs/Luc plus IHEECs/Luc (or IHEECs/ERp/Luc) in SCID mice were deter- 
mined. Details on this are found in the Supplemental Experimental Procedures. 
For basic procedures, see the Supplemental Experimental Procedures. 

Statistical Analyses 

The data are expressed as the mean + SD. Significance was assessed using 
an independent two-tailed Student’s t test; A p value of less than 0.1 was 
considered statistically significant. NS, non-specific. *p < 0.1, **p < 0.01, 
***p < 0.005 by Student’s t test. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures, 
seven figures and can be found with this article online at http://dx.doi.org/ 
10.1016/j.cell.2015.10.034 
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SUMMARY 

Self-tolerance to immune reactions is established via 
promiscuous expression of tissue-restricted anti- 
gens (TRAs) in medullary thymic epithelial cells 
(mTECs), leading to the elimination of T cells that 
respond to self-antigens. The transcriptional regu- 
lator Aire has been thought to be sufficient for the in- 
duction of TRAs, despite some indications that other 
factors may promote TRA expression in the thymus. 
Here, we show that the transcription factor Fezf2 
directly regulates various TRA genes in mTECs inde- 
pendently of Aire. Mice lacking Fezf2 in mTECs dis- 
played severe autoimmune symptoms, including 
the production of autoantibodies and inflammatory 
cell infiltration targeted to peripheral organs. These 
responses differed from those detected in Aire-defi- 
cient mice. Furthermore, Fezf2 expression and Aire 
expression are regulated by distinct signaling path- 
ways and promote the expression of different clas- 
ses of proteins. Thus, two independent factors, 
Fezf2 and Aire, permit the expression of TRAs in 
the thymus to ensure immune tolerance. 



INTRODUCTION 

In all vertebrates, T cells are generated in the thymus, and their 
antigen receptors are generated through random somatic DMA 
rearrangement, which provides a molecular basis for recognizing 
and eliminating an essentially unlimited number of pathogens 
(Anderson and Takahama, 2012; Hogquist et al., 2005; Klein 
et al., 2014). However, this process inevitably gives rise to the 
production of self-reactive T cells. To prevent autoimmune dis- 
ease, immune reactions against self-antigens must be sup- 
pressed through central and peripheral tolerance (Sakaguchi 
et al., 2006). The repertoire of T cell receptors (TCRs) that 
potently react self-components is removed in the thymus, a pro- 
cess that comprises the fundamental mechanism of central 
tolerance (Kappler et al., 1987). 

CrossMark 



T cell differentiation and repertoire selection are dependent 
on the thymic microenvironment, mainly supported by thymic 
epithelial cells (TECs) and dendritic cells (Gallegos and Bevan, 
2004; Hinterberger et al., 2010). After positive selection by 
cortical thymic epithelial cells (cTECs) based on the proper affin- 
ity to major histocompatibility complex (MHC) molecules, T cells 
migrate to the medulla and undergo negative selection (Klein 
et al., 2014). mTECs promiscuously express peripheral tissue- 
restricted antigens (TRAs) (Derbinski et al., 2001) and T cells 
that potently react to these antigens are eliminated by apoptosis. 
The transcriptional regulator Aire has been shown to be essential 
for the promiscuous TRA expression and negative selection (Lis- 
ton et al., 2003; Mathis and Benoist, 2009), since autoimmune 
symptoms manifest in both A/re-deficient (Aire~^~) mice (Ander- 
son et al., 2002) and patients with autoimmune polyendocrinop- 
athy candidiasis ectodermal dystrophy (APECED) caused by 
mutations of the AIRE gene (Akirav et al., 2011). Aire expression 
is regulated by members of the tumor necrosis factor receptor 
superfamily such as Tnfrsfl 1 a (also known as receptor activator 
of nuclear factor-icB; RANK) and CD40. In more than 10 years 
since the discovery oi AIRE (Aaltonen et al., 1997; Nagamine 
et al., 1997), TRA expression in the thymus has been thought 
to be exclusively dependent on Aire. However, there are TRA 
genes that are induced in the absence of Aire (Derbinski et al., 
2005), raising the possibility that other transcriptional regulators 
are functional in the induction of self-antigens in mTECs. 

Here, we investigated the transcriptional regulators of TRAs in 
mTECs and report the identification of the key transcription fac- 
tor Fez/2 (also known as Zfp31 2 and FezI). Fezf2 plays an indis- 
pensable role in the expression of TRAs and the establishment of 
immune tolerance independently of Aire. The parallel RANK/ 
CD40 and lymphotoxin beta receptor (LTpR) signaling pathways 
were shown to regulate the expression of A/re and Fez/2, respec- 
tively. Thus, these findings constitute an important advance in 
the current understanding of immune tolerance and the mecha- 
nisms underlying autoimmune diseases. 

RESULTS 

Fezf2 Is Highly Expressed in Mouse and Human mTECs 

As mTECs express TRAs in the absence of Aire (Derbinski et al., 
2005), we hypothesized that there are additional transcriptional 
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Figure 1 . Loss of Fezf2 in the Mouse mTEC Leads to Autoimmunity 

(A) qRT-PCR analysis of Fezf2 mRNA expression (relative to Gapdh) in various cell types in the mouse thymus (3-week-old, n = 6). Error bars indicate the 
mean ± SD. ND, not detected. 

(B and C) Flow cytometric analysis of Fezf2 and Aire protein expression in mTECs (B) or in the mTEC'° (MHC ll'°'") and mTEC'^' (MHC ll'^'^'^) populations (C). mTECs 
were identified as EpCAM"^ and MHC class ll'^ (MHC ll'^) among the CD45“ and Ly51“ cells. Gray plots, the negative control. 

(D and E) Immunohistochemical analysis of Fezf2 expression in keratin 5^ (Krt5^) mTECs (D) and Fezf2 and Aire expression in the medulla (E) of the mouse thymus. 

(F) FEZF2 expression in KRTS'^ mTECs in human fetal thymus (29-week-old, male). The bottom panels are enlarged views of the upper panels. Scale bars in 
(D)-(F), 100 |im. 

(G) Experimental design used for thymus transplantation (top) and macroscopic images of the thymus-grafted nude mouse (bottom left) and kidneys (bottom 
right). The arrowhead indicates the grafted thymus under the renal capsule. 

(H) Morphological analysis of grafted thymus (H&E staining). 

(I) CD4/CD8 differentiation of thymocytes in WT/nu and KO/nu mice (n = 3). 

(J) Gross anatomy (top) and weight (bottom) of spleens from the WT/nu and KO/nu mice (n = 9). Scale bar, 5 mm. Data are shown as the mean ± SD. **p < 0.01 . 

(K) Structure of the spleen and lymph nodes of WT/nu and KO/nu mice (H&E staining). Arrowheads, follicles in the spleen; TZ, T cell zones in the inguinal lymph 
nodes. 

(legend continued on next page) 
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regulators that are responsible for the induction of TRAs in the 
mTEC. To identify the genes selectively expressed in mTECs, 
we analyzed the GeneChip data on the mRNA isolated from 
mTECs and cTECs. Among a group of genes that were ex- 
pressed 20-fold higher in mTECs than cTECs, only Fezf2 was 
categorized as a transcriptional regulator (Figure S1A); thus, 
we focused on it as a potential mTEC-specific transcriptional 
regulator. Intriguingly, the expression level of Fezf2 was as 
high as that oi Aire in mTECs (Figure S1 B). 

Fezf2 was originally discovered in the developing forebrain of 
Xenopus and zebrafish (Eckler and Chen, 2014). Fezf2 contains 
the evolutionarily conserved six zinc finger domains (Figure SI C) 
and has been shown to be a neuronal transcription factor 
required for the differentiation and specification of corticospinal 
neurons and other subcerebral projection neurons in the brain 
(Shim et al., 2012), but no immunological function has been re- 
ported to date. 

To examine Fezf2 gene expression in the mouse thymus, we 
performed qRT-PCR analysis and showed that Fezf2 mRNA 
was specifically expressed in mTECs among the various thymic 
cell types (Figure 1A). The expression of Fezf2 was develop- 
mentally induced during the perinatal period (Figure SID). 
The development of mTECs is characterized by the upregula- 
tion of CD80 and MFIC class II (Gray et al., 2006), the expres- 
sion of which determines whether mTECs belong to either the 
mTEC*^' or mTEC'° population. mTEC^' cells functionally pro- 
duce thousands of TRAs for the negative selection of T cells 
(Derbinski et al., 2005; Sansom et al., 2014). Flow cytometric 
analysis revealed that Fezf2 is expressed in 30% of the mTEC'° 
population and almost all of the mTEC^' population (Figures IB 
and 1C). Immunostaining clearly indicated that Fezf2 is ex- 
pressed by mTECs positive for keratin 5, but not cTECs (Fig- 
ure ID), while almost all of the the Aire-expressing cells were 
positively stained for Fezf2 (Figure IE). When we transiently 
overexpressed Fezf2 and Aire in an mTEC line, the Fezf2 pro- 
tein specifically located in the nucleus but did not colocalize 
with Aire (Figure S1E). In the human thymus, immunohisto- 
chemical analysis showed that FEZF2 was selectively ex- 
pressed in keratin 5-positive mTECs (Figure IF), and AIRE 
was expressed in the FEZF2-positive cells (Figure SI F). These 
data clearly imply that Fezf2 plays a functional role in mouse 
and human mTECs. 

Fezf2 Deficiency Leads to the Infiltration of 
Inflammatory Cells 

Fezf2-deficient (Fezf2~^~) mice have a brain development 
defect and fail to survive after weaning due to an inability to 
consume solid food (Hirata et al., 2004). The mice die at 
4 weeks of age, prior to the obvious manifestation of immuno- 
logical symptoms. Thus, we tested the immunological function 
of thymic stromal cells by transplantation of hematopoietic-cell- 
depleted thymuses into the renal capsule of nude mice (Ander- 
son et al., 2002). Eight weeks later, the successful engraftment 



was observed in the nude mice grafted with the Fezf2~^~ 
thymus (KO/nu) as well as in the nude mice grafted with control 
littermate thymus (WT/nu) (Figure 1G). There was no difference 
in Aire expression (Figure S1G) or thymic size between the KO/ 
nu and WT/nu mice (Figure 1H). Flow cytometric analysis re- 
vealed a normal differentiation of CD4 and CDS thymocytes 
in the engrafted thymuses (Figure II), while the number of 
FoxpS'" regulatory T (Treg) cells was slightly decreased in KO/ 
nu mice (Figure S1H). KO/nu mice exhibited splenomegaly 
with an increased number of follicles as well as an expansion 
of the T cell zone in the lymph nodes (Figures 1J and IK). An 
infiltration of inflammatory cells, including lymphocytes, was 
observed in peripheral tissues, including the lung, liver, kidney, 
and small intestine (Figures 1L and 1M). However, there was 
no obvious inflammatory cell infiltration into the pancreas or 
retina (Figure S1I), which are highly affected in Aire~'~ mice 
(Anderson et al., 2002). 

Thymocyte and mTEC Development in Fezf2~'~ Mice 

To examine the function of Fezf2 in the development of mTECs 
and thymocytes, Fezf2~'~ mice were analyzed before weaning, 
at the 3 weeks of age. Keratin 5-positive mTECs were abnormally 
distributed in clusters in the thymic medulla of the Fezf2~'~ mice 
as compared to the wild-type (WT) mice (Figures 2A, S2A, and 
S2B). The cTEC number was unchanged (Figure S2C), but the 
mTEC number and the ratio of mTECs (CDSO"^ cells) to total 
TECs (EpCAM‘^CD45“ cells) were decreased in Fezf2~'~ mice 
(Figures 2B, 2C, and S2D). However, there was no significant dif- 
ference in the ratio of mTEC*^' and mTEC'° (Figures 2D and S2E) 
or in Aire protein as well as mRNA expression in mTECs (Figures 
2E and 2F) between WT and Fezf2~'~ mice. Collectively, these 
results suggest that Fez/2 is not involved in the regulation of 
Aire expression in mTECs or the mTEC*^'^'® ratio, although thymic 
organization and the mTEC number are slightly influenced by 
Fezi2 deficiency. The differentiation of thymocytes into CD4'^ 
or CDS"^ T cells was normal (Figure 2G), while the ratio of 
Foxp3‘^ to CD4'^ thymocytes was modestly decreased in 
Fezf2~'~ mice (Figure 2H). We found that Fezf2~'~ mice ex- 
hibited shifts in the CD4'^ and CDS"^ TCRV3 repertoire in the 
thymus (Figure 21), suggesting a crucial role of Fezf2 in shaping 
the TCR repertoire of CD4^ and CD8^ T cells. 

Fezf2 Controls TRA Expression Independently of Aire 

How does Fezf2 contribute to thymic gene expression? We per- 
formed a genome-wide analysis of mRNAs expressed in the 
mTECs isolated from Fezf2~'~ mice and selected genes that 
were expressed more than four times higher in the WT than the 
Fezf2~'~ mTECs (Figure 3A). All of the 16 Fezf2-dependent 
genes belonged to TRAs, which are expressed in specific pe- 
ripheral tissues according to the BioGPS gene annotation portal 
(http://biogps.org/#goto=welcome), and are thus regarded as 
Fezf2-dependent TRAs (Table SI). Interestingly, five of these 
Fezf2-dependent TRAs (Krt10, Resp18, Fabp9, Maoa, and 



(L) Inflammatory cell infiltration in the lung, liver, kidney (arrowheads), and small intestine (S. int) of the KO/nu mice (H&E staining). 

(M) Schematic of inflammatory cell infiltration into the peripheral tissues of WT/nu and KO/nu mice. Each hexagon represents a single grafted mouse (n = 9). The 
blue triangle indicates the detection of inflammatory cell infiltration in the organ. Scale bars in (H), (K), and (L), 200 |am. 

See also Figure S1 . 
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Figure 2. Fezf2 Shapes the Repertoire of T Cell Receptors 

(A) Distribution of mTECs and Aire protein expression in the WT and Fezf2~'~ thymus. C and M represent the cortex and meduiia, respectiveiy, and the 
arrowheads indicate ciustered Krt5-positive mTECs. Scaie bars, 100 |im. 

(B) The ratio of the totai mTECs (CD80^ ceiis) to TECs (CD45“EpCAM^ ceiis) (WT, n = 8; Fezf2~'~, n = 9). 

(C) The totai mTEC number in WT and Fez/2“^“mice (n = 8 per genotype). 

(D) The ratio of mTEC^' and mTEC'° (MHCii^' and MHCii'°) among the totai mTECs (CD45-Ly5rMHCii^EpCAM-^ ceiis) (WT, n = 3; Fezf2~^-, n = 4). 

(E) Aire protein expression in mTECs in WT and Fezf2~'~ mice. The numbers above the iines indicate the ratio of Aire‘S ceiis to the totai mTECs (WT, n = 6; Fezf2~'~ , 
n = 8). 

(F) A/re mRNA expression in mTECs (WT, n = 10; Fezf2~'~ , n = 8). 

(G) The normai differentiation of thymocytes into CD4'^ or CD8'^ T ceiis (WT, n = 10; Fezf2~'~ , n = 8). 

(H) The ratio of FoxpS'^ ceiis to CD4'^ thymocytes (WT, n = 6; Fezf2~'~ , n = 5). 

(i)The usage ofTCRV|3onCD4'^orCD8^T ceiis in the thymus (WT, n = 10; Fezf2~'~, n = 8-,Aire~^~, n = 5). Data are shown as the mean ± SD. NS, not significant (p> 
0.05); *p<0.05; **p<0.01. 

See aiso Figure S2. 



Timd2) had already been reported to be induced in the absence 
of Aire, i.e., Aire-independent TRA genes (Derbinski et al., 2005). 
These data suggest that Fezf2 regulates the expression of a 
distinct subset of TRAs in the thymus. On the other hand, 
there were no significant differences in the expression of MHO 
class I and II molecules (H2-D, H2-K, and H2-A, H2-E, respec- 
tively), costimulatory molecules (Cc/80 and Cc/86), invariant chain 



(Cd74), chemokines (Xcl1 [Lei et al., 2011] and Cc/27 [Laan et al., 
2009]) or mTEC terminal differentiation marker Ivl (Nishikawa 
et al., 2010) (Figures S3A-S3E). 

qRT-PCR analysis revealed that mRNA expression of the 
Fezf2-dependent TRAs was not reduced in the mTECs iso- 
lated from Aire~^~ mice (Figure 3B). The expression of represen- 
tative Aire-dependent genes (Spt1, Ins2, Mup4, and S100a8) 
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Figure 3. Fezf2 Regulates a Subset of TRA 
Genes in an Aire-Independent Manner 

(A) Fezf2-dependent genes in mTEC based on 
mRNA expression profiiing of mTECs from WT 
and Fezf2~'~ (KO) mice (n = 5 per genotype). Listed 
are the 16 most highiy downreguiated genes in 
Fezf2~'~ mTECs compared with WT mTECs. The 
genes that have been reported as Aire-indepen- 
dent TRA genes are shown in red. Genes ex- 
pressed in a singie tissue are highiighted in biue 
shading. 

(B) Comparison of mRNA expression of Fezf2- 
dependent TRA genes in WT, Aire~'~, and Fezf2~'~ 
mTECs. 

(C) The mRNA expression of representative Aire- 
dependent TRA genes in WT, Aire~'~, and Fezf2~'~ 
mTECs. The data in (B) and (C) are shown as the 
mean ± SD, *p < 0.05 versus WT (WT, n = 4; 
Fezf2~'~ , n = 3; A\re~'~ , n = 5); ND, not detected. 

(D) Chromatin immunoprecipitation (ChIP) assay. 
Fezf2 binds to the promoter region of Fezf2- 
dependent but not Aire or Aire-dependent TRA 
genes. 

(E) A comparison of the Fezf2-dependent and Aire- 
dependent genes. Most of the Fezf2-dependent 
genes are different from the Aire-dependent genes 
coiiected from the database (GeneChip data: 
GSE69105 and GSE85; foid change [FC] >1.5, p < 
0.05). 

(F) Luciferase assay in an mTEC iine 1C6. The 
expression of Fezf2-dependent TRA genes {Maoa, 
Calb1 , and Nol4) is controiied by Fezf2, but 
not by N-terminus-deieted Fezf2 (Fezf2AN) or C- 
terminus-deieted Fezf2 (Fezf2AC). Luciferase vec- 
tors: empty pNiuc vector (NL) or pNiuc vectors 
containing the promoter region of Maoa, Calb1 , 
or Nol4. Data in (D) and (F) are shown as the 
mean ± SD, *p < 0.05, **p < 0.01 , ***p < 0.001 . 

See aiso Figure S3. 




(Waterfield et al., 2014) was not significantly decreased in 
Fezf2~'~ mice (Figure 3C). In addition, microarray analysis re- 
vealed that the downreguiated genes in the Fezf2~'~ mTECs 
were for the most part different from those downreguiated in 
Aire~'~ (Figure 3D). The expression of most of the Fezf2-depen- 
dent genes was higher in mTEC^' than mTEC'° (Figure S3F), 



ure 1C). Bioinformatics analysis revealed 
that Aire and/or Fezf2 control over 60% 
of the mTEC-specific TRAs (Figure S3G). 
The genes downreguiated in Fezf2~'~ 
mTECs included certain TRAs related 
to autoimmune or neoplastic diseases 
(Endo et al., 2009; Fatourou and Koskinas, 
2009; Roulois et al., 2013) (Figure S3FI). 
These results show that Fezf2 regulates 
a unique subset of TRA genes indepen- 
dently of Aire. 

A recent global chromatin immunopre- 
cipitation sequencing (ChIP-seq) analysis 
revealed that Fezf2 binds to the promoter 
of numerous protein-coding genes in cortical progenitors (Lo- 
dato et al., 2014). Our ChIP assay demonstrated that Fezf2 
directly bound to the promoter region of Fezf2-depenedent 
TRA genes, but not to that of the A/re or Aire-dependent TRA 
genes in mTECs (Figure 3E). Additionally, we found Fezf2 bound 
to the promoter region of certain TRA genes such as Mbp, Gad1 , 



consistent with the higher expression of Fezf2 in mTEC (Fig- Col2a1, and /V/l/c 7, which are known to be autoantigens or tumor 
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Figure 4. Analyses of Lymphoid Organs in 
TEC-Specific Fez/2- Deficient Mice 

(A) Localization of mTECs in the thymic medulla in 
Foxn1 -Cre"^ Fezf2^^'~ mice. The arrowhead indicates 
the clustered mTECs. Scale bars, 100 ^im. 

(B) The ratio of mTECs (CD45“, EpCAM^, and 
CD80^ cells) to the total TECs (n = 8 per genotype). 

(C) Aire protein expression in mTECs (n = 6 per 
genotype). 

(D) CD4/CD8 differentiation of thymocytes (n = 12 
per genotype). 

(E) The ratio of Foxp3^CD25'^ CD4 T cells in the 
thymus (n = 12 per genotype). 

(F) The increased number of T cells in spleen and 
lymph nodes (LN) in Foxnl -Cre"^ Fezf2}^'~ mice (n = 4 
per genotype). 

(G) The elevated ratio of CD62L'°CD44'^' effector/ 
memory T cells in Foxnl -Cre"^ Fezf2^'''“ mice (n = 6 
per genotype). 

(H) The reduced ratio of Foxp3'^CD25'^ CD4 T cells 
in the LN of Foxnl -Cre"^ Fezf2^^'~ mice (n = 6 per 
genotype). 

(I) The serum Ccl2 (MCP-1) level measured by 
cytometry beads assay (n = 5 per genotype). 
Data are shown as the mean + SD. *p < 0.05; **p < 
0 . 01 . 

See also Figure S4. 
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antigens (Figure SSI) (Ali et al., 201 1 ; Belogurov et al., 2008; Ko- 
matsu et al., 2014; Roulois et al., 2013). Luciferase reporter 
assay revealed that Fezf2 regulates the expression of Maoa, 
Calb1, and Nol4 genes (Figure 3F). Therefore, these data indicate 
that Fezf2 directly binds to the promoter region and controls 
the gene expression of distinct TRAs in an Aire-independent 
manner. 

The Generation and Immunological Analysis of 
TEC-Specific Fezf2- Deficient Mice 

We generated the TEC-specific Fezf2-deficient (Foxnl -Cre^ 
Fezf2^‘^“) mice and further examined the role of Fezf2 in auto- 



immunity. In accordance with the find- 
ings in the Fezf2~'~ mice, keratin 5-pos- 
itive mTECs were also abnormally 
distributed in clusters in the thymic 
medulla of Foxnl -Cre"^ Fezf^^'~ mice 
(Figure 4A). Both the mTEC number 
and the fraction of mTECs among the 
total TECs were decreased (Figures 4B 
and S4A). However, the expression of 
Aire in the mTECs (Figure 4C) and the 
ratio of mTEC'"' and mTEC'° (Fig- 
ure S4B) were unaffected in Foxnl -Cre"^ 
Fezf2^^'~ mice. The differentiation of thy- 
mocytes into CD4"^ or CDS'^ T cells in 
the thymus was unaltered (Figure 4D). 
B cells differentiated normally in the 
bone marrow, spleen, and lymph nodes 
(Figures S4C-S4E). There was a modest 
reduction in the ratio of Foxp3^CD25'^ to 
CD4^ cells in the Foxnl -Cre"^ Fezf2^^'~ thymus (Figures 4E and 
S4F). The percentage of CD4^ and CDS"^ T cells was un- 
changed (Figure S4G), but the T cell number was significantly 
elevated in the spleen and lymph nodes of the Foxnl -Cre'^ 
Fezf2^^'~ mice (Figure 4F). The Foxnl -Cre"^ Fezf2^^'~ mice 
showed an elevated proportion of CD44'^'CD62L“ effector/ 
memory T cells (Figure 4G), while the percentages of 
Foxp3'^CD25‘^ Treg cells were reduced in the lymph nodes 
(Figure 4H). The Foxnl -Cre^ Fezf2}^'~ mice had a higher serum 
level of Ccl2 (also known as Mcp-1) (Figure 41), a chemokine 
that induces the migration and infiltration of monocytes and 
T cells (Deshmane et al., 2009). 
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TEC-Specific Fezf2-Deficient Mice Display Autoimmune 
Symptoms, Including Autoantibody Production and 
Peripheral Organ Inflammation 

We analyzed the infiltration of inflammatory cells, including lym- 
phocytes, into peripheral tissues, as well as autoantibodies in the 
sera of Foxn1-Cre‘^ Fezf^^'~ mice. Inflammatory cell infiltration 
was detected in the lung, liver, kidney, stomach, small intestine, 
salivary gland, brain, and testis of Foxn1 -Cre"^ Fezf^^'~ mice (Fig- 
ure 5A). Thus, the spectrum of autoimmune target tissues was 
found to be wider than that o\A\re~'~ mice. There was, however, 
no obvious inflammatory cell infiltration into the pancreas or 
retina, which is consistent with the observation in KO/nu mice 
(Figure S1I). Foxnl-Cre"^ Fezf2}^'~ mice exhibited hypergamma- 
globulinemia (Figure S5). To assess whether the sera from KO/ 
nu or Foxn1-Cre‘^ Fezf^^'~ mice contained autoantibodies, we 
immunostained various tissues derived from Rag1~'~ mice 
with the sera. Antibody reactivity was detected in both KO/nu 
and Foxnl-Cre"^ Fezf^^'~ sera against the lung, retina, skin, and 
joint tissues, but the staining patterns were different from 
that of Aire~'~ serum (Figure 5B), suggesting that Fezf2~'~ and 
Aire~'~ mice have autoantibodies recognizing distinct tissue an- 
tigens. The staining pattern of the serum from Foxn1-Cre^ 
Fezf2^^'~ mice was much broader than that of Aire~'~ . Approxi- 
mately 30% of the TEC-specific Fezf2-deficient mice (five of 
16) died by 12 months, whereas all of the mice survived 

for more than 1 6 months after birth (20/20), implying the autoim- 
mune symptoms in the case of Fezf2 deficiency were more 
severe than that in A\re~'~ deficiency. The serum from Foxnl- 
Cre'^ Fezf^^'~ mice contained significantly increased autoanti- 
bodies that recognize recombinant proteins of the Fezf2-depen- 
dent TRAs, Ttr, Maoa, and Clabi, as compared to the sera 
from WT and Aire~'~ mice (Figure 5C). In contrast, the serum 
from the Foxnl-Cre'^ Fezf‘^^'~ mice was not reactive to Aire- 
dependent TRAs, such as Irbp (DeVoss et al., 2006) (Figure 5C). 
Taken together, we conclude that Fezf2 functions in mTECs to 
ensure immunological tolerance against certain tissue antigens 
and effectively suppresses the development of autoimmune 
responses. 

Fezf2 Is Regulated by the LT|3R Pathway, but Not by the 
RANK/CD40 Pathways or Aire 

The receptors of the TNF superfamily are required for the forma- 
tion of the thymic microenvironment and central tolerance. For 
example, the RANK and CD40 signaling pathways are essential 
for mTEC development and A/re expression (Akiyama et al., 
2008). Therefore, we examined the expression of Fez/2 in 
mice deficient in Tnfrsfl 1 (encoding RANK) and Cc/40. Fezf2- 
expressing cells were normally observed upon immunohisto- 
chemical analysis and Fezf2 mRNA expression was unaffected 
in Foxnl-Cre^ Tnfrsf11a^'^~ mice (Figures S6A-S6C) as well as 
Cd40~^~ mice (Figures S6D-S6F). Normal Fezf2 expression 
was also observed in mice deficient in Tnfsf11 (encoding 
RANKL) (Figures 6A and 6B). To examine the possibility that 
Aire controls Fez/2 expression, we analyzed Aire~'~ mice and 
found no difference in Fezf2 expression between the WT and 
Aire~'~ mice (Figures 6C-6E). These results indicate that 
Fezf2 expression is not regulated by either the RANK/CD40 
pathways or Aire. 



Both TNF-receptor-associated factor 6 (Traf6) and LTpR sig- 
naling critically contribute to mTEC differentiation in the estab- 
lishment of the medullary microenvironment and central toler- 
ance (Akiyama et al., 2005; Boehm et al., 2003). Traf6~'~ mice 
displayed a significant reduction in Fezf2 mRNA and protein 
expression in the thymus (Figures S6G and S6H). The ratio of 
mTEC to the total TECs was significantly decreased in Ltbr~^~ 
mice (Figure 6F), and Fezf2 mRNA and protein expression was 
decreased in the Ltbr~^~ mTECs (Figures 6G-6I). Consistent 
with this, the expression of the Fezf2-dependent TRA genes 
Krt10, Fabp9, Calb1 , and Nol4 were markedly decreased 
(Figure 6J), while the expression of Aire-dependent TRAs as 
well as Aire itself were unaffected in the Ltbr~^~ mTECs (Martins 
et al., 2008; Venanzi et al., 2007). Thus, Fez/2 expression is 
induced by the LTpR pathway independently of the RANK/ 
CD40 pathways or Aire. 

DISCUSSION 

The Discovery of a Key Transcription Factor Involved in 
the Establishment of Central Tolerance 

In this study, it has been demonstrated that Fezf2, which is 
exclusively expressed in mTECs in the thymus, regulates TRA 
expression independently of Aire. Mice specifically lacking 
Fez/2 in mTECs develop severe autoimmune phenotypes, with 
autoantibody production and inflammatory cell infiltration. 
Notably, Fezf2 and Aire play non-redundant roles in the TRA 
expression that is crucial for the elimination of self-reactive 
T cells (Figure 7). Thus, Fezf2 functions in the thymus as an 
mTEC-specific transcription factor required for the establish- 
ment of central tolerance, thereby playing a key role in the regu- 
lation of untoward immunological reactions. 

Distinct Function of the RANK/CD40 and LTpR Pathways 
in the Expression of TRAs in mTECs 

The TNF receptors RANK, CD40, and LTpR are required for 
central tolerance and TRA expression (Akiyama et al., 2008; 
Martins et al., 2008). The RANK and CD40 signaling pathways 
regulate Aire expression and the development of mTECs, and 
Spi-B is involved in the differentiation of mTEC'° to mTEC^' 
downstream of the RANK signaling (Akiyama et al., 2014). The 
LTpR signaling pathway is needed for the formation of normal 
thymic architecture and mTEC differentiation (Boehm et al., 
2003; Venanzi et al., 2007; White et al., 2010), but Ltbr defi- 
ciency does not affect either A/re expression or Aire-dependent 
TRA gene expression in mTECs (Seach et al., 2008). The 
evidence obtained in this study shows that Fez/2 is a trans- 
criptional regulator that is induced downstream of the LTpR 
signaling, and that Fezf2 regulates the expression of certain 
genes (Figure 3) but is not involved in the differentiation of 
mTEC'° to mTEC^' (Figures 2D and S4B). Fezf2-dependent 
TRA expression, but not Aire-dependent TRA expression, was 
decreased in Ltbr~'~ mTECs (Figure 6J). Thus, RANK/CD40 
and LTpR regulate the induction of distinct TRA genes through 
Aire and Fezf2, respectively. 

It will be of considerable interest to investigate why there are 
two pathways for the establishment of central tolerance in 
mTECs. The LTpR signaling pathway is essential for lymphoid 
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Figure 5. Tissue Infiltration and Autoantibody Produc- 
tion in TEC-Specific Fezf2-Deficient Mice 

(A) Inflammatory cell infiltration in the peripheral tissues in the 
Foxnl-Cre"^ Fezf2^^'~ mice. Infiltration was detected in lung, 
liver, kidney (arrowheads), and small intestine (S. int), but not in 
the retina or pancreas (H&E staining). 

(B) Detection of autoantibody reactivity in the serum of Foxn1 - 
Cre"^ Fezf^^'~ mice. Sections of tissues from Rag1~'~ mice 
were stained with sera from WT/nu, KO/nu, Aire~'~ , or Foxn1- 
Cre"^ Fezf^^'~ mice followed by a secondary antibody labeled 
with Alexa 555. Serum was collected from Aire~'~ or Foxn1- 
Cre"^ Fezf2^^'~ mice at the age of 20 weeks. Scale bars in (A) and 

(B) , 200 i^m. 

(C) Detection of autoantibodies against Fezf2-dependent (Ttr, 
Maoa, and Calb1) or Aire-dependent (Irbp) TRAs in the serum 
from Foxol-Cre"^ Fezf2^^'~ mice (ELISA, n = 6 per genotype). 
Data are shown as the mean ± SD. *p < 0.05; **p < 0.01 . 

See also Figure S5. 
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organogenesis, while the RANK and CD40 signaling pathways 
are required for bone homeostasis and the T-cell-mediated B 
cell response, respectively (Locksley et al., 2001). Although 
the Fezf2 gene is conserved in all vertebrates (Eckler and 
Chen, 2014) that have lymphoid organs (Hofmann et al., 2010; 
Takaba et al., 2013), the Aire gene is conserved only in jawed 
vertebrates (Saltis et al., 2008). During the course of evolution, 
the LTpR-Fezf2 axis might have emerged as the primitive 
pathway for immune tolerance, while the RANK/CD40-Aire 
axis emerged later to perform certain critical functions not car- 
ried out by Fezf2. 

Direct T ranscriptional Regulation of TRA Genes by Fezf2 

Aire does not have an obvious DNA binding domain and is not 
regarded as authentic transcription factor (Abramson et al., 
2010; Giraud et al., 2014). Aire interacts with several nuclear 
factors such as hypomethyrated histone H3 (Koh et al., 
2010) and the ATF7ip-MBD1 protein complex (Waterfield 
et al., 2014), unleashing the stalled RNA polymerases in order 
to epigenetically promote the ectopic transcription of TRA 
genes (Peterson et al., 2008). In contrast, Fezf2 has a DNA 
binding domain and directly regulates the expression of a 
large number of genes (Eckler and Chen, 2014). Although a 
recent global ChIP-seq analysis showed that Fezf2 binds to 
the promoters of a large number (over 10,000) of protein-cod- 
ing genes (Lodato et al., 2014), Fezf2 does not bind the pro- 
moters of Aire or Aire-dependent TRA genes, such as Ins2 
(Figures 3D and S3I). While Aire expression is mainly limited 
to mTECs, Fezf2 is expressed and plays a role in other organs, 
such as the brain. Further studies are necessary to understand 
how Fezf2 functions as an inducer of TRA in mTECs. One of 
the possibilities is that Fezf2 might interact with mTEC-spe- 
cific transcriptional regulator(s) that specifically modulate 
chromatin structure and mediate the tissue-specific function 
of Fezf2. 

By our bioinformatics analysis, most Fezf2-dependent genes 
did not overlap with Aire-dependent genes (Figure 3D). Although 
some TRA genes might be regulated by Aire and Fezf2 co-oper- 
atively, the majority of TRAs were found to be separately 
controlled by either Fezf2 or Aire in mTECs (Figure S3G). Since 
the expression of certain TRAs was unaffected in both Aire~'~ 
and Fezf2~'~ mice, we cannot rule out the possibility that another 
transcriptional regulator(s) contributes to the TRA expression in 
mTECs. 

The Spectrum of Autoimmunity in Fezf2-Deficient Mice 

In Fezi2 deficiency, inflammatory cell infiltration was observed 
in several peripheral tissues, but not in the retina or pancreas, 
which are often affected in A\re deficiency (Figure SI I) (Ander- 
son et al., 2002). Antibodies in the sera from KO/nu mice re- 
cognized peripheral tissues of Rag7 -deficient mice, and the 
staining pattern was largely complementary to that of A\re~'~ 
mice (Figure 5B). In addition, the serum from Foxnl-Cre^ 
Fezf^^'~ mice contained significantly increased autoantibodies 
that recognize Fezf2-dependent but not Aire-dependent TRAs 
(Figure 5B; Table S2). Fezf2 directly binds to the promoter re- 
gion of Fezf2-dependent TRA genes in the mTECs, but not to 
that of but not Aire-dependent TRA genes (Figure 3E). These 



data strongly support the conclusion that the Fezf2-deficient 
mice have autoimmune reactions in a selective and antigen- 
dependent manner, and the failure of clonal deletion causes 
the autoimmune phenotypes in Fezf2 deficiency, even though 
they exhibited small reductions in the mTEC and thymic Treg 
cell number and there was disorganization of mTECs in the me- 
dulla. However, since Aire is also involved in mTEC and thymic 
Treg cell development (Anderson and Takahama, 2012; Yang 
et al., 2015), it will be important in future studies to investigate 
how Fezf2 contributes to mTEC development and T cell selec- 
tion, including clonal deletion, anergy, and Treg cell differentia- 
tion in the thymus. 

Association of Fezf2-Dependent TRAs and FEZF2 
Mutations with Autoimmune and Neoplastic Diseases 

Fezf2-dependent genes included certain TRAs related to auto- 
immune diseases or neoplastic diseases (Figures 3A and S3G): 
Ttr is related to an autoantibody in rheumatoid arthritis (Sharma 
et al., 2014), Amy2a is involved in autoimmune pancreatitis 
and fulminant type 1 diabetes (Endo et al., 2009), and Afp and 
Mud are reported to be tumor antigens in several different 
cancers (Fatourou and Koskinas, 2009; Roulois et al., 2013). 
Intriguingly, Fezf2-dependent TRAs are mostly categorized as 
intracellular or plasma membrane proteins. In contrast, many 
Aire-dependent TRAs are secretory proteins (Table SI), which 
is consistent with the fact that patients with APECED are afflicted 
with endocrine disorders. 

It was demonstrated that FEZF2 is highly and specifically ex- 
pressed in mTECs among the various types of thymic cells in hu- 
mans. Although there is no report in the literature on autoimmune 
diseases caused by FEZF2 mutation, certain Fezf2-dependent 
TRAs are related to autoimmune diseases and cancers (Endo 
et al., 2009; Fatourou and Koskinas, 2009; Roulois et al., 2013; 
Sharma et al., 201 4) and FEZF2 mutations have been associated 
with autism (Kwan, 2013; Sanders et al., 2012) and tumors (Shu 
et al., 2013). This study has revealed the transcriptional program 
in the thymus governed by Fezf2 and Aire, both of which 
contribute in an essential manner to the suppression of auto- 
immune diseases. Thus, this study represents an important 
advance in our understanding of the molecular mechanisms un- 
derlying the establishment of central tolerance and the adaptive 
immune system. 

EXPERIMENTAL PROCEDURES 
Mice 

Fezf2^'~ (CDB0498K) (Hirata et al., 2004) and LtbA~ (CDB0531 K) (Mouri et al., 
2011) mice were obtained from the animal facility of the Center for Develop- 
mental Biology, Riken, Japan (http://www2.clst.riken.jp/arg/mutantmicelist. 
html). Aire^'~ (stock number, #004743), Foxnl-Cre (#018448), and nude 
mice (#000819) were purchased from the Jackson Laboratory. Fezf2^'^^' mice 
(Han etal., 2011), Tnfsf11a~'~ mice, and Rag1~^~ mice were described previ- 
ously. Mice were maintained under specific pathogen-free conditions and 
handled in accordance with the guidelines for animal experiments of the Uni- 
versity of Tokyo. 

Flow Cytometric Analysis and Cell Isolation 

The following fluorescence-conjugated antibodies were used in the flow cyto- 
metric analysis and cell sorting. Flow cytometric analysis was performed with a 
FACSCanto II (BD Biosciences) and FlowJo software (Tree Star). Thymic 
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Figure 7. A Schematic Model of the Thymic 
Program of TRA Expression Regulated by 
Fezf2 and Aire 

mTEC promiscuously express TRAs, which are 
dependent on Fezf2 and Aire. The LT(3R pathway 
regulates the expression of Fezf2, which directly 
induces Fezf2-dependent TRAs by binding to the 
promoter of these genes. The RANK/CD40 path- 
ways regulate the expression of Aire, which in- 
teracts with nuclear proteins (such as histone H3) 
and controls the expression of Aire-dependent 
TRAs through epigenetic mechanisms. The LTbR- 
Fezf2 and RAN K/CD40- Aire pathways are both 
required for the establishment of immune toler- 
ance. PIC, preinitiation complex (including RNA 
polymerase II and basal transcription factors). 



430 2.0 array. Data were analyzed using Gene- 
Spring and deposited in the GEO database 
under accession number (GSE69105). Fezf2- 
dependent genes (GSE69105) and Aire-depen- 
dent genes (GSE85) that are common probes 
were picked up from the platforms (total 8,848 
coding genes) and compared (cutoff; 1.5-fold, p 
value; p < 0.05). 
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stromal cells were prepared according to a previous report (Hikosaka et al., 
2008), and sorted with a FACSAria III (BD Biosciences). 

qRT-PCR 

To detect the expression of Fez/2, Aire, and TRA mRNAs, total RNA was ex- 
tracted from mTECs using an RNeasy mini or micro kit (Invitrogen). Total 
RNA was subjected to reverse transcription with a first-strand synthesis super- 
mix for qRT-PCR (Invitrogen). The amplification conditions were previously 
described (Komatsu et al., 2014). The gene expression was normalized by 
Gapdh expression. 



Detection of Inflammatory Cell Infiltration 
and Autoantibodies 

Organs from the mice were harvested and fixed 
with paraformaldehyde (Nacalai), embedded in 
paraffin, sectioned, and stained with H&E. Sera 
from the WT/nu, KO/nu, Aire~'~ , and Foxnl-Cre^ 
Fezf2^^°^~ mice were prepared from tail vein blood. 
For the detection of the autoantibodies, serial 
frozen sections of the lung, joint, eye, and skin from Rag1~'~ mice were 
incubated with diluted sera in 0.01% Triton-PBS (1:50) followed by Alexa- 
Fluor-555-conjugated goat anti-mouse immunoglobulin G (IgG) polyclonal 
antibodies (Invitrogen). 

Luciferase Reporter Assay 

Cells from the mTEC line 1C6 were transfected into pNluc plasmids or pNluc- 
the promoters together with Fezf2 expression or control vectors using Lipo- 
fectamine 2000 (Invitrogen) and incubated for 24 hr. Signals were detected 
with Nano-Glo luciferase assay system (Promega) and quantified with a plate 
reader (Berthold MicoLumatPlus LB96V 96well Luminometer). 



Microarray Analysis 

Total RNA of mTECs isolated by cell sorting from WT mice and Fezf2~'~ 
mice was extracted with an RNeasy Micro Kit (QIAGEN) and processed 
for microarray analysis, as previously described (Komatsu et al., 2014). 
GeneChip analysis was performed using the Affymetrix mouse genome 



Cytometric Beads Assay 

The mouse inflammatory cytokines or chemokines were detected by cytomet- 
ric bead assay kits (BD Biosciences) and analyzed with a FACSCanto II (BD 
Biosciences). 



(B) qRT-PCR analysis of Fez/2 mRNA expression in mTECs (n = 3 per genotype). 

(C) Flow cytometric analysis of Fezf2 protein expression in mTECs from WT and Aire~^~ mice. Right panel, the mean fluorescence intensity (MFI) ratio (n = 8 per 
genotype). 

(D) Fez/2 mRNA expression in mTECs from WT and Aire~'~ mice (n = 5 per genotype). 

(E) Histochemical analyses of Fezf2 protein expression in the WT and Aire~'~ thymus. Scale bars, 100 |im. 

(F) Flow cytometric analysis of the fraction of mTECs (CDOO"^ cells) among the total TECs (EpCAM'^CD45“ cells) in Ltbr^^~ and Ltbr~'~ mice (n = 4 per genotype). 

(G) The decreased mRNA expression of Fez/2 in Ltbr~'~ mTECs compared to the WT (n = 4 per genotype). 

(H) Fezf2 protein expression in mTECs in Ltbr^^~ and Ltbr~^~ mice (n = 4 per genotype). 

(I) Histochemical analyses of Fezf2 protein expression in the Ltbr^^~ and Ltbr~^~ thymus. Scale bars, 100 |am. 

(J) Fezf2-dependent or Aire-dependent TRA mRNA expression in mTECs in Ltbr^'~ and Ltbr~'~ mice (n = 4 per genotype). Statistical data are shown as the mean 
± SD. NS, not significant; *p < 0.05; **p < 0.01 . 

See also Figure S6. 
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ELISA 

The autoantibodies in the mouse serum were anaiyzed using ELiSA kits 
(Southern Biotech) and quantified with a piate reader (BioRAD iMark Micro- 
piate Reader), as previousiy described (Komatsu et ai., 2014). Mouse recom- 
binant proteins were transthyretin (Uscn, RPA726Mu01), monoamine oxidase 
a (Usbio, 155899), caibindin (Uscn, RPG438Mu01), and interstitiai retinoi bind- 
ing protein (Uscn, RPA367Mu01). 

Statistical Analysis 

Statisticai anaiysis was performed using one-way ANOVA foiiowed by post 
hoc Bonferroni test when appiicabie or unpaired two-taiied t test (*p < 0.05; 
**p < 0.01; ***p < 0.001; NS, not significant, throughout the paper). Aii data 
are expressed as the mean ± SD. Resuits are representative of more than 
two independent experiments. 

Detaiied methods can be found in the Suppiemental Information. 
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Data were analyzed using GeneSpring and deposited in the GEO database un- 
der accession number GSE69105. 
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SUMMARY 

While antibody titers and neutralization are consid- 
ered the gold standard for the selection of successful 
vaccines, these parameters are often inadequate 
predictors of protective immunity. As antibodies 
mediate an array of extra-neutralizing Fc functions, 
when neutralization fails to predict protection, inves- 
tigating Fc-mediated activity may help identify immu- 
nological correlates and mechanism(s) of humoral 
protection. Here, we used an integrative approach 
termed Systems Serology to analyze relationships 
among humoral responses elicited in four HIV vac- 
cine trials. Each vaccine regimen induced a unique 
humoral “Fc fingerprint.” Moreover, analysis of 
caseicontrol data from the first moderately protec- 
tive HIV vaccine trial, RV144, pointed to mechanistic 
insights into immune complex composition that 
may underlie protective immunity to HIV. Thus, 
multi-dimensional relational comparisons of vaccine 
humoral fingerprints offer a unique approach for the 
evaluation and design of novel vaccines against 
pathogens for which correlates of protection remain 
elusive. 

INTRODUCTION 

Although over 80 vaccines, covering more than 20 diseases, 
have been licensed in the United States, vaccine design efforts 
against persisting infections, including malaria, tuberculosis, 
and HIV, continue to fail. These setbacks have driven a shift 
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from empirical vaccine design approaches to rational vaccine 
development strategies that consider pathogen life cycle, path- 
ogen structural information, and immunological correlates of 
protection. However, the immune correlates for most globally le- 
thal pathogens have yet to be defined, complicating vaccine 
design efforts. Prospective immunogens are frequently chosen 
based on measures of antibody (Ab) titer and neutralization, 
regardless of their mechanistic effects in immunity. However, 
for most clinically approved vaccines, titer and neutralization ac- 
tivity alone do not account for protective immunity (Pulendran 
and Ahmed, 201 1). Instead, protective immunity is often observ- 
able in the absence of neutralization, and accumulating evidence 
across a spectrum of vaccines has suggested a critical role for 
extra-neutralizing Ab functions such as Ab-dependent cellular 
cytotoxicity (ADCC), Ab-dependent cellular phagocytosis 
(ADCP), Ab-dependent complement deposition (ADCD), and 
Ab-dependent respiratory burst (ADRB) in both protection from 
and post-infection control of HIV (Barouch et al., 2015; Bourna- 
zos et al., 201 4; Hessell et al., 2007), influenza (DiLillo et al., 201 4; 
Jegerlehner et al., 2004), herpes simplex virus (HSV) (Kohl and 
Loo, 1982; Kohl et al., 1981), Ebola virus (Warfield et al., 2007), 
and malaria (Joos et al., 2010; Osier et al., 2014). 

Following vaccination, Abs targeting an extensive array of epi- 
topes with different affinities and Fc-effector profiles collectively 
contribute to the formation of immune complexes that direct 
antimicrobial functions via their constant domains (Fc). In addi- 
tion to the rapid diversification of the antigen (Ag)-binding 
domain (Fab), the Fc domain is also rapidly tuned during an im- 
mune response, altering the affinity of Ab interactions with innate 
immune receptors (e.g., Fc receptors and complement) ex- 
pressed on all innate immune cells (Ackerman and Alter, 2013; 
Chung and Alter, 2014). The diversity of Fc profiles, potential 
Fab variants, and tissue-specific Fc receptor expression results 
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Figure 1. System Serology Analysis 

This Systems Serology platform allows for the broad characterization of the polyclonal extra-neutralizing IgG immune profile induced by vaccination. IgG was 
purified from subjects enrolled in four different HIV vaccine trials (RV144, VAX003, HVTN204, and IPCAVD001). Six Fc-effector functions and 58 biophysical 
measurements were assayed (complete list described in Table S1). All 64 parameters were collected to create an extra-neutralizing serological signature for the 
four vaccine trials, using an array of unsupervised and supervised machine learning algorithms. 

See also Tables S1 and S2. 



in a flexible humoral immune response poised for the elimination 
of pathogens via mechanisms beyond simple neutralization. 
Hence, analytical approaches able to integrate diverse facets 
of the humoral immune response will be critical to: (1) define un- 
expected correlates of protection from infection in protection 
studies or studies of natural disease resistance, (2) guide the se- 
lection of promising vaccines/immunogens through principled 
analysis of humoral immune profiles, and (3) define the relation- 
ships between Ab populations and functions that point to mech- 
anisms of protective immunity. 

As a prominent example, the ability to select HIV vaccine can- 
didates has been hindered by an inadequate understanding of 
the immunological correlates of protection from HIV. However, 
several clinical trials have been conducted, one of which 
(RV144) demonstrated a modest level of protection (31.2% 
reduction in the risk of infection) (Rerks-Ngarm et al., 2009), 
potentially harboring clues that may guide future vaccine 
development. This protection was observed in the absence of 
neutralizing Abs, cytotoxic T-cell responses, and high Ab titers. 
Univariate and multivariate logistic regression analyses linked 
the reduced risk of infection with non-immunoglobulin (lg)A Ab 
responses targeting the VI V2 region of the HIV envelope and 
ADCC activity (Haynes et al., 2012; Zolla-Pazner et al., 2014). 
Follow-up analyses identified additional features of the humoral 
immune response associated with protection, including the pref- 
erential induction of lgG3 responses, which coordinated multiple 
Ab effector functions, including ADCC and ADCP (Chung et al., 
2014b; Yates et al., 2014). However, in the correlates analysis, 
although many Ab assays were initially considered, the identifi- 
cation of immune correlates in RV144 was constrained by the 
selected assays that deeply interrogated neutralization and Ab 
specificity but profiled only a limited set of Fc features, including 
only a few Ab subclasses/isotypes (IgG, lgG3, IgA) and a single 
function, ADCC. 

Here, we aimed to consider more integrative and network- 
oriented relationships between a broader array of polyclonal 
Ab features and functional properties associated with vaccine 



regimens and outcomes. As an initial test of this approach, 
termed “Systems Serology,” we examined recent HIV vaccine 
trials, including that of the moderately protective RV144 vaccine 
ALVAC/AIDSVAX B/E (Rerks-Ngarm et al., 2009), two trials 
that did not demonstrate efficacy in phase 2b trials, (VAX003; 
AIDSVAX B/E [Pitisuttithum et al., 2006] and HVTN204; DNA/ 
rAD5 [Churchyard et al., 2011]), and one experimental phase 1 
study designed to evaluate the prototype vaccine Ad26 
vector (IPCAVD001; Ad26.ENVA.01) (Barouch et al., 2013a). A 
battery of modeling techniques that emphasize co-variation 
among measurements was applied to these data, revealing 
features of vaccine-induced “fingerprints” that offer new in- 
sights concerning polyclonal Ab immune responses elicited by 
vaccines. 

RESULTS 

Systems Serology 

Beyond their role in neutralization, Abs mediate a vast array of 
additional functions via their Fc domains. Thus, a Systems 
Serology approach was developed to broadly profile the extra- 
neutralizing Ab activity of vaccine-induced polyclonal Abs (Fig- 
ure 1). The initial platform interrogated six Fc-effector functions 
(ADCC, ADCP, ADCD, and three Ab-dependent natural killer 
(NK) cell activities (Figure 1 ). Linked to these six functions, 58 bio- 
physical measurements were simultaneously captured, including 
binding to Fey receptors (FCGRs) and the relative abundances of 
an array of Ag-specific Abs (Table SI) in 120 samples from four 
HIV vaccine trials (see Supplemental Experimental Procedures). 

Identification of Vaccine-Specific Signatures 

Unsupervised hierarchical clustering grouped vaccine regimens 
primarily by immunogen type (Figure 2; Table S2), including an 
adenovirus (Ad) vector cluster composed of mixed HVTN204 
(DNA/Ad5) and IPCAVD001 (Ad26) samples (Figure 2, cluster 1 : 
green and yellow, respectively) and a protein immunogen cluster 
containing largely mixed VAX003 (protein alone) and RV144 
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Figure 2. Hierarchical Clustering of Vaccine Trial Profiles by Biophysical Properties and Functional Responses 

Data were compiled for the four different vaccine trials. Each column represents the full Ab profile of an individual subject. Colored bars along the bottom 
correspond to the vaccine trial for each subject. Ab properties are grouped by generalized features (Function, FcR affinity, Bulk IgG, lgG1 , lgG2, lgG3, and lgG4), 
indicated by the colored bars on the right. Specific features are listed in Table S2. 

See also Table S1. 



(poxvirus prime/protein boost) samples (Figure 2, cluster 2: blue 
and red, respectively). While this clustering highlights the domi- 
nant influence of immunogen type in directing distinct humoral 
profiles, specific features driving this separation cannot be 
clearly discerned. 

To gain enhanced resolution on the key features contributing 
to profile differences, a multidimensional combined feature 
selection method (the least absolute shrinkage and selection 
operator; LASSO) (Tibshirani, 1997) and partial least-squares 
discriminant analysis (PLSDA) (Arnold et al., 2015; Lau et al., 
201 1 ) were used. Focusing initially on RV1 44 and VAX003, which 
shared the same protein immunogen but provided different effi- 
cacies, as few as 7 of the 64 features accounted for 76% of the 
variance across the two trials, driving nearly complete resolution 
of the vaccine profiles (Figures 3A and 3B). Separation of the Ab 
profiles was observed in the scores plot, with points representing 
individual RV144 (red) or VAX003 (blue) vaccinees (Figure 3A). 
Differences between vaccine-elicited Ab profiles were largely 
captured along the first dimension (LV1), which accounted for 
the majority of the variance between the two trials (61 %). The 
corresponding loadings plot (Figure 3B) illustrates the contribu- 
tion of the seven LASSO features, where the relative location 
of an individual feature is associated with the corresponding 
vaccine subpopulation in the scores plot (Figure 3A). Elevated 



gpl 20-specific lgG3 levels, relative to other features (Figure 3B), 
uniquely marked the RV144 vaccine profile (Figure 3A), as previ- 
ously described (Chung et al., 2014b; Yates et al., 2014). By 
contrast, the VAX003 Ab profile was associated with known in- 
duction of higher total FIIV gpl 40-specific Ab titers, dominated 
by lgG4 (Chung et al., 201 4b). Flowever, additional novel features 
were identified that associated with the non-protective VAX003 
profile, including elevated total gpl 40-specific responses, 
higher Ab-driven NK cell degranulation, and chemokine secre- 
tion. This result suggests that differences in relationships be- 
tween Ab features rather than the total Ab amount may be essen- 
tial for resolving “protective” from “non-protective” vaccine 
profiles. Moreover, the scores plot highlights an unappreciated 
level of heterogeneity among the RV144 vaccinees, with respect 
to the magnitude of the lgG3 response, where 26% of the RV1 44 
vaccinees exhibited a more highly skewed lgG3 response spe- 
cifically across the second dimension, LV2 (Figure 3A). 

When all four vaccine trials were analyzed simultaneously, 15 
of the 64 features separated the vaccine profiles, accounting 
for 57% of the variance. The first dimension (LV1) revealed a 
similar separation as the hierarchical clustering analysis, sepa- 
rating based on protein (Figure 3C, right) versus Ad-based 
vectored immunization (33% variance), confirming the dominant 
effect of immunogen type in directing humoral profiles. LV1 
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Figure 3. PLSDA and LASSO Identify Unique 
Combinations of Features that Differentiate 
Vaccine Trial Ab Profiles 

(A and B) In (A), the scores plot represents the 
RV144 (red) and VAX003 (blue) vaccine profile 
distribution for each vaccinee tested (dots) from 
the LASSO and PLSDA. Remarkably, as few as 
seven Ab features, listed on the loadings plot (B), 
separated the vaccine profiles with 100% cali- 
bration and 97% cross-validation accuracy. LV1 
captured 61 % of X variance and 72% of the Y 
variance. 

(C and D) In (C), LASSO and PLSDA of all four 
vaccine profiles identified 15 Ab features (D) able 
to discriminate between the distinct vaccine 
regimens (red, RV144; blue, VAX003; green, 
HVTN204; and yellow, IPCAVD001) with 84% 
cross-validation accuracy. Together, LV1 and LV2 
captured 57% of the X variance and 45% of the Y 
variance, respectively. 

See also Tables SI and S2. 
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separation was strongly driven by gp41 -skewed immunity, due 
to gp41 being included in the Ad regimens but not included in 
VAX003 and only partially included in RV144. Furthermore, Abs 
targeting clade AE Ags (gp120 and V1V2) uniquely marked 
RV144 and VAX003 profiles, as subjects were immunized with 
clade AE-derived immunogens. Thus the Ag itself, rather than 
the vector/immunization regimen alone, was a critical determi- 
nant influencing vaccine-induced humoral profiles. 

The second dimension identified additional features that 
further split the vaccine profiles, accounting for an additional 
24% of the variance, contributing to an unexpected grouping 
and separation of RV144/Ad26 and VAX003/Ad5 profiles. This 
separation was primarily related to differences in lgG3 subclass 
and V1V2 levels, which scattered in multidimensional space 
more closely with RV144 and Ad26 profiles (Figure 3D). There- 
fore, markers previously associated with reduced risk of infec- 
tion in RV1 44 co-segregated with the experimental Ad26 vaccine 
trial, which used a vector similar to the ones used in regimens 
recently shown to protect non-human primates from infection 
through non-neutralizing polyfunctional Abs (Barouch et al., 
2015). 

Thus, use of the LASSO and PLSDA, incorporating co-varia- 
tion between features, identified key variables involved in classi- 
fying vaccine regimens and provided enhanced resolution of 
the specific Ab features associated with differentiating vaccine 
profiles, objectively identifying novel correlates of Ab-mediated 
protection. 

Correlation Networks Highlight Distinct Humoral 
Relationships 

Next, we aimed to gain insights into relationships between 
features contributing to differences among vaccine-induced 
polyclonal profiles, adapting correlation network analysis tools 
commonly used in the transcriptomics field. The resulting 



network models revealed remarkably 
different Ab co-regulation interactions 
among the vaccine regimens, providing 
novel insights into the specific Ab features that may contribute 
to unique vaccine effector profiles. 

VAX003 exhibited the most interconnected network, 
comprising four dense subnetworks (Figure 4A). The most prom- 
inent subnetworks included an unusual tightly tethered mixture 
of lgG2 and lgG3 responses that are rarely co-selected (Chaud- 
huri and Alt, 2004; Chaudhuri et al., 2007), pointing to the induc- 
tion of a non-coherent poorly coordinated functional response 
(Chung et al., 2014b). Interestingly, all Fc-effector functions 
were connected to a third subnetwork consisting largely of 
IgGI and bulk IgG responses specific for a broad array of Ags 
that was unexpectedly connected to the fourth lgG4 subnet- 
work. lgG4 Abs have previously shown to compete actively for 
immune complex occupancy, resulting in dampened Ab function 
(Chung et al., 201 4b). Thus, the VAX003 network exhibited linked 
lgG1/lgG4 responses staggered next to a dense lgG2/lgG3 
cluster, highlighting the peculiar subclass co-selection profiles 
driven by the non-protective VAX003 strategy. 

While less prominent clusters emerged in the RV144 network 
model (Figure 4B), ADCP, ADCD, and ADCC were largely teth- 
ered to a network of gp120-, gp140- or V1V2-specific IgGI 
and/or lgG3 responses. The V1V2B-specific lgG3 response 
was highly associated with the large IgGI network, suggesting 
that high lgG3 V1V2B-specific responses act as a critical surro- 
gate of a coordinated lgG3 and IgGI response. The total IgG 
V1V2AE response was directly tethered to both ADCP and 
ADCC, suggesting that this specific VI V2 response may play 
an influential role in driving Ab functionality. lgG3 V1V2B and 
lgG3 VI V2AE responses were not directly correlated, suggesting 
that these VI V2 responses may represent disparate humoral im- 
mune responses rather than a single cross-reactive response. 
Moreover, because depletion of lgG3 only results in 30% re- 
duction in Ab functionality (Chung et al., 2014b), it is likely that 
lgG3 responses may serve as a surrogate for a subpopulation 
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Figure 4. Correlation Networks of Vaccine-Trial-Elicited Humoral Immune Responses Probe Immune Complex Dynamics 

(A-D) Correlation networks were generated for VAX003 (A), RV144 (B), HVTN204 (C), and IPCAVD001 (D). Each node (circle) represents either a biophysical 
feature or an effector function. Nodes are connected with an edge (line) if they are significantly correlated. The different Ab isotypes are identified by different 
colors as indicated. Edge thickness and color intensity of the connecting lines are directly proportional to statistical significance and edge weight, respectively 
(thicker and brighter network interactions represent a stronger correlation). The size of each node is directly proportional to its degree of connectedness (i.e., the 
number of features to which that node is connected). 

See also Figure S1 and Tables S1 and S2. 



of vaccine-induced lgG1 Abs that direct the polyfunctional Ab re- 
sponses observed in RV144. 

The HVTN204 (DNA/Ad5) network (Figure 4C) contained a 
highly connected subnetwork, with multiple tethers to less 
well-connected subnetworks of additional Ab subclasses. The 
dominant subnetwork consisted of lgG1 Env- and V1 V2-specific 
responses, with ADCC, ADCD, and NK cell responses tightly 
intercalated within the subnetwork, sandwiched between lgG1 
and lgG3 responses. However, ADCP did not appear in the 
network. This exclusion of ADCP suggests that Ad5 and/or 
DMA may preclude the induction of phagocytic Ab responses, 
which have been linked to protection from SIV acquisition 
(Barouch et al., 2013b). 

Conversely, the vaccine profile induced by the experimental 
IPCAVD001 (Ad26) exhibited a nearly single, densely connected 
network tethered to Ab functions and Fc-receptor binding activ- 
ity (Figure 4D). The large network consisted of a tight grid of 
related bulk IgG/IgGI responses, while lgG2, lgG3, and lgG4 
formed sparse external clusters, including a less functional, in- 
terconnected lgG2/lgG4 cluster (Figure 4D, top right). The clear 
linkages between Ab functions and IgGI features, including an 
IgGI VI V2-driven ADCP response, further supports the potential 
role of lgG3 as a surrogate of a highly effective, polyfunctional 
IgGI response. 



Overall, these statistically robust network analyses (Figure SI) 
point to unique relationships between all features and functions 
among the four vaccine trials. Identification of “desirable” Ab 
networks delineating specific biophysical Ab feature/function 
relationships that are associated with protective immunity may 
help identify mechanisms underlying correlates, such as the as- 
sociation of lgG3 and VI V2 features with reduced risk of HIV 
infection by RV144. 

System Serology Analysis of Interactions between 
RV144 Surrogates of Reduced Risk of Infection 

Systems Serology approaches can complement existing 
methods for identifying predictive mechanism(s) of protective 
immunity. While logistic regression involves stepwise evaluation 
of strongly correlated individual variables. Systems Serology 
approaches can additionally identify relationships between Ab 
features that are predictive of protection. Toward this purpose, 
we next examined RV144 profiles segregating with known 
correlates of reduced risk of infection. Thus, we dissected two 
Ab features (lgG/lgG3 VI V2), which were previously positively 
associated with reduced acquisition in the RV144 case:control 
analysis, in our cohort of uninfected vaccinees. Importantly, 
IgA levels were also included in this analysis, due to their impli- 
cated role as correlates of risk (Haynes et al., 2012). Profiles 
were then compared between “responders” (top 33% for each 
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correlate of reduced risk) (Rerks-Ngarm et al., 2009; Zolla-Paz- 
ner et al., 2014) or “non-responders.” The IgG VI V2 responder 
profile (Zolla-Pazner et al., 2014) was driven by 16 features (Fig- 
ures 5A and 5B), including elevated VI V2 responses and a poly- 
functional Fc-effector profile linked to higher Ab-dependent NK 
cell degranulation (i.e., GDI 07a and interferon y [IFNy] expres- 
sion), ADCP, and ADCC. Conversely, the non-responders ex- 
hibited elevated gpl 20-specific IgA and increased binding to 
FCGR2B, the sole inhibitory Fey receptor, both features that 
have been previously associated with antagonism of Fc-effector 
activity (Tomaras et al., 2013; White et al., 2014). 

Similar analysis of the lgG3 VI V2 correlate of reduced risk 
pointed to ten Ab features that distinguished responder/non- 
responder profiles (Figures 5C and 5D) marked by increased 
broad Fey receptor binding among responders— particularly to 
activating FCGR2A, involved in ADCP, and to FCGR3A, critical 
for NK cell degranulation and chemokine secretion. Surprisingly, 
IgA was not selected as a negative predictor of the lgG3 VI V2 
responder profile. These findings confirm that IgG VI V2 (Fig- 
ure 5A) responders exhibit a balanced polyfunctional profile, 
while lgG3 VI V2 responders (Figure 5B) possessed Abs selec- 
tively enhanced for binding to FCGR2A, associated with 
ADCP, that has been linked to protection in nonhuman primates 
(NHPs) (Barouch et al., 2013b). 

Defining integrative Signatures of Protective Humorai 
Immune Profiles in RV144 

Finally, to assess whether our approach could provide enhanced 
resolution of mechanism(s) of potential reduced risk of infection 
in the RV144 trial, we next analyzed data from the case:control 
study (Haynes et al., 2012). Specifically, data characterizing 
distinct Ab subclass levels targeting multiple vaccine Ags and 



Figure 5. Identification of V1V2^'®'^-Asso- 
ciated Signatures within RV144 Vaccine 
Responses 

(A) RV144 vaccinees were classified within the IgG 
VI V2AE^'9^ (blue) (top 30%) or IgG VI V2AE'°'" (red) 
groups. 

(B) LASSO identified a profile of 16 features that 
differentiated the two groups with 100% calibration 
and 80% cross-validation accuracy. The loadings 
plot (right panel) illustrates the features that sepa- 
rated IgG VI V2AE^'9^ or IgGVI V2AE'°'" responders. 
Together, LV1 and LV2 captured 33% of the X 
variance and 94% of the Y variance, respectively. 

(C) , the same analysis was repeated for RV144 
vaccinees classified as lgG3 VI V2^'9^/lgG3V1 V2'°'", 
with 92% cross-validation and 100% calibration 
accuracy. 

(D) LASSO identified a signature often features that 
best separated these two groups. Together, LV1 
and LV2 captured 39% of the variance in X and 
84% of the variance in Y, respectively. 

See also Tables SI and S2. 



functions comparable to those included 
in our original profiling data were included 
in the analysis. PLSDA using data from all 
cases and controls separated placebos 
from vaccinees, as expected, along LV1 
(Figure 6A). In contrast, PLSDA of vaccinees alone was unable 
to separate the 40 infected from the 201 uninfected vaccinees 
included in the case:control analysis (Figures S2A and S2B). 
Similarly, network analyses showed only modest differences be- 
tween vaccinated cases and controls (Figures S2C and S2D), 
likely related to the fact that it is unclear which uninfected vacci- 
nees were actually exposed and protected. 

To address this complication, we defined groups representing 
extreme profiles based on known correlates of risk (Haynes 
et al.,2012). Given that the lgG3 and IgG VI V2 levels were highly 
correlated (Figure S3), we elected to focus on the IgG VI V2 and 
IgA relationship due to the intriguing relationships found for these 
two parameters in the non-case:control data (Figure 5A). Two 
sets of samples were identified: (1 ) a region containing the great- 
est ratio of uninfected: infected vaccinees was classified as the 
“low-risk” group (Figure 6C, blue box; percentage difference = 
28%, p = 0.0088), and (2) the area that contained the lowest 
ratio of uninfected: infected vaccinees was classified as the 
“risk” group (Figure 6C, red box; percentage difference = 
-26%, p = 0.0003). As expected, the lowest frequency of infec- 
tions was observed in the IgG VI V2^'^^/lgA'°'^ region of the plot, 
and the highest frequency of cases was observed in the IgG 
VI V2‘°'^/lgA^'^^ group. PLSDA analyses clearly separated these 
two groups (Figure 6D), with the low-risk group largely associ- 
ated with features (Figure 6E positive loadings) that mark high 
IgG responses against the V1V2A scaffold as well as the VI V2- 
169K scaffold, corresponding to Ab responses against the viral 
variant able to evade the vaccine response among the infected 
vaccinees (Holland et al., 2012). 

Correlation networks further pointed to distinct profiles be- 
tween the two groups. Three subnetworks were observed in the 
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low-risk case:controls— an independent small network of IgA 
features and two larger linked clusters, including (1) all lgG3 fea- 
tures and (2) IgG responses tethered to Ab functional features 
(Figure 6F). These two clusters contained a single link between 
an lgG3 response and IgG response directed at the same 
V1 V2C scaffold, which was linked to all other V1 V2 scaffold re- 
sponses. Again, this suggests that the lgG3 response may be a 
surrogate of a highly functional lgG1 response more directly 
involved in modulating Ab functionality. Conversely, the risk 
group exhibited five clusters (Figure 6G), of which four were small 
groups that appeared to form relationships independent of all the 
lgG3 features. One of the small clusters, separate from lgG3 and 
all functions, included several IgG V1 V2 responses, highlighting a 
unique structure of the humoral response among the risk group. 
By contrast, all lgG3 features were tightly interconnected and 
directly tethered to IgG features and the primary ADCC and 
neutralization results but not to the secondary ADCC features. 

These findings indicate a mis-coordinated lgG/lgG3 V1V2 
response largely separate from Ab function in the vaccinees 
who went on to become infected, whereas lgG/lgG3 V1V2 re- 
sponses were well integrated within the network profile in vacci- 
nees with reduced correlates of risk (i.e., IgG V1V2^'^^/lgA'°'^). 
Even though many of the desirable features— in particular, 
poly-functional responses identified in Figure 5— were not avail- 
able for analysis, these data highlight the IgG V1V2 responses 
that likely drive protective immunity. 

DISCUSSION 

Because the humoral immune response consists of waves of B 
cell responses that progressively induce higher affinity, broadly 
targeting, and functionally enhanced complexes of Abs poised 
to eliminate a pathogen, we aimed to develop a multivariate 
approach that could capture the complexity of interactions 
between Abs at unprecedented depths. The Systems Serology 
approach described here not only identified features reported 
in previous correlates analyses, including elevated lgG3 re- 
sponses in RV144 (Chung et al., 2014b; Yates et al., 2014) and 
Ab binding to VI V2 (Zolla-Pazner et al., 2014), but also pointed 
to largely indirect connections between VI V2 IgG or lgG3 re- 
sponses and Ab function (ADCC, ADCP, and ADCD) in vaccinees 
(Figure 4B) and the low-risk RV144 case:control samples (Fig- 
ure 6F). Instead, vaccine-specific IgGI responses were largely 
directly tethered to Ab function (Figure 4B). This suggests that 
the lgG3 “protective” signatures may either represent a surro- 
gate of an effective Ab response or only contribute in combina- 
tion with multiple other Ab features (e.g., IgGI) to induce antiviral 
activity. Along these lines, while depletion of lgG3 Abs from 
RV144 vaccinees resulted in a significant loss of ADCP and 
ADCC activity, the activity was not completely depleted with 
the removal of this subclass of Abs (Chung et al., 2014b), sug- 
gesting that lgG3 Abs alone do not mediate the activity in poly- 
clonal R144 sera and that function was also mediated by Abs 
remaining in the depleted purified IgGs. Therefore, the induction 
of lgG3 responses in RV144 may mark the coordinated produc- 
tion of highly functional IgGI responses that may be functionally 
enhanced through altered IgGI glycosylation, known to impact 
Fc-receptor affinity (Chung et al., 2014a), rather than subclass 



selection differences alone (Chung et al., 2014a; Hristodorov 
et al., 2013). Thus, together with the lgG3 Abs, these IgGI Abs 
may form highly functional immune complexes that are able to 
rapidly and effectively clear the virus or infected cells. 

Interestingly, while vaccine-induced IgA responses were asso- 
ciated with enhanced risk of infection (Haynes et al., 2012), IgA 
emerged as an antagonist of the IgG VI V2 (Figures 5A and 5B) 
response but not the lgG3 response in the PLSDA analyses 
(Figures 5C and 5D). Furthermore, IgA responses were not con- 
nected to any of the subnetworks containing functional re- 
sponses identified in the network analyses, suggesting that IgA 
responses may serve as a marker of a deregulated or less func- 
tional humoral immune response rather than a direct antagonist 
of protective humoral immune responses. Thus, while it is certain 
that pre-incubation of Ag with IgA monoclonal Abs may prevent 
IgGI and lgG3 monoclonal Ab binding (Tomaras et al., 2013), it 
does not appear that these responses were directly co-induced 
(Figure 6). Moreover, given that the infected vaccinees exhibited 
both the lowest and the highest levels of IgA responses (Fig- 
ure 6B), it is unlikely that IgA responses directly contributed to 
impaired humoral immune protection. Likewise, monoclonal 
therapeutics generated as IgAs exhibit potent cytotoxicity and 
clearance of tumor targets through Fca receptors expressed on 
effector cells (e.g., neutrophils and macrophages) (Black et al., 
1 996; Dechant and Valerius, 2001 ) and have been recently linked 
to protection from simian-HIV (SHIV) challenge (Watkins et al., 

2013) . Thus, future studies may aim to define the vaccine strate- 
gies that most effectively co-select a highly functional blood IgG 
response and a highly effective IgA response that may collec- 
tively prevent infection at the portal of entry. 

Protection from infectious diseases like HIV will likely require 
the targeted containment of viral replication/dissemination at 
the site of infection. Along these lines, HIV is transmitted across 
mucosal barriers, where FcyR2-expressing monocytes/macro- 
phages are abundant (Brown and Mattapallil, 2014; Zigmond 
and Jung, 2013). Moreover, ADCP activity was present in the 
RV144, VAX003, and IPCAVD001 networks (Figure 4) but was 
not observed in the HVTN204 network (Figure 4C) that was 
highly skewed to the elicitation of NK-cell-mediated activities. 
Conversely, ADCP was tightly tethered within the RV144 and 
IPCAVD001 networks (Figure 4), was enhanced in the high 
VI V2 lgG3/lgG1 RV1 44 vaccinees (Figure 5), and was previously 
associated with protection in NHP (Barouch et al., 2013b). Thus 
these results raise the possibility that ADCP may represent a crit- 
ical function, within polyfunctional Ab profiles, that is required for 
protection from mucosal transmission. 

Beyond HIV, these vaccine-profiling approaches have broad 
applications and can aid in vaccine design efforts against 
many of the deadliest global pathogens for which immune corre- 
lates of protection have yet to be elucidated. For example, recent 
clinical evidence suggests that Abs present in Ebola-virus- 
infected convalescent immune sera contribute to improved 
clinical outcomes in infected patients (Kreil, 2015; Lyon et al., 

2014) ; and, recently, vesicular stomatitis virus (VSV) vaccination 
has been shown to drive robust humoral immune responses 
(Regules et al., 2015) that provide protection from infection 
(Henao-Restrepo et al., 2015). However, the specific mecha- 
nism(s) by which Abs provide protection remains unclear. Yet, 
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a non-neutralizing monoclonal Ab, 1 3c6, has been shown to pro- 
vide protection from infection in an Fc-dependent manner (Olin- 
ger et al., 2012), suggesting that non-neutralizing Ab functions 
contribute to antiviral immunity. Thus, similar to the application 
of Systems Serology for the evaluation of HIV vaccine re- 
sponses, the application of a Systems-Serology-guided dissec- 
tion of natural humoral immune profiles that emerge in Ebola 
virus survivors and vaccinees may provide insights into the 
immunological correlates and mechanisms of protection that 
may help guide future vaccine efforts. 

Thus, in this article. Systems Serological profiling provides a 
novel approach for the dissection of four HIV vaccine regimen 
profiles at unprecedented depths and a framework for dissecting 
the immune profiles that segregate with previously defined 
correlates of risk in efficacy studies. Systems Serology 
complements traditional multivariate approaches aimed at 
defining independent predictors of vaccine efficacy, aiding in 
the identification of Ab function/feature relationships that track 
with protective humoral immune profiles. Accordingly, these 
relational tools provide an additional powerful method for 
comparing the immune profiles of different vaccine groups/out- 
comes to provide greater mechanistic insights underlying the re- 
lationships of features that may contribute to immune control. 
While this analysis included 64 humoral features, many other fea- 
tures can be collected, including measures of neutralization, af- 
finity, Fc glycosylation, and so forth. Moreover, these techniques 
may be expanded to examine the protective/immunopatholog- 
ical role of Abs in non-infectious disease settings, including ma- 
lignancies and/or autoimmunity, as well as how Abs may differ 
among gender, ethnicity, and age. Thus, this study lays the 
groundwork for the evaluation, deep characterization, and com- 
parison of polyclonal vaccine profiles for many future vaccines, 
for which correlates of protective immunity are still elusive. 

EXPERIMENTAL PROCEDURES 

Vaccine Samples 

RV144 (Rerks-Ngarm et al., 2009): plasma samples from 30 vaccinated sub- 
jects at week 26 (2 weeks after the final vaccination) were provided by the Mil- 
itary HIV Research Program (MHRP). RV1 44 case:control study data were pro- 
vided by the RV144 study team. Serum samples from 30 vaccinated subjects 
at month 30.5 (2 weeks after the final vaccination) were provided by the Global 
Solutions for Infectious Disease (GSID). HVTN204 (Churchyard et al., 2011): 
Serum samples from 30 vaccinated subjects at 2 weeks after the final vaccina- 
tion were provided by the National Institute of Allergy and Infectious Diseases 
(NIAID) HIV Vaccine Trials Network (HVTN). IPCAVD 001 (Barouch et al., 
2013a): Serum samples from 30 vaccine subjects at 2 weeks after the final 



vaccination were provided by Dan Barouch. Detailed descriptions of each vac- 
cine are included in the Supplemental Information. 

Purifying Buik IgG 

IgG was purified from all vaccine plasma and serum samples using Melon Gel 
columns according to the manufacturer’s instructions (Thermo Scientific), and 
the concentration was calculated using a human IgG ELISA kit (Mabtech). 

Ab-Functionai Profiling 

The following assays were performed to functionally profile the Fc-effector 
functions of all vaccine Abs. In order to assess ADCP, a THP-1 -based 
ADCP assay was performed as previously described (Ackerman et al., 
2011). ADCC was assayed using a modified rapid fluorescent ADCC 
(RFADCC), as previously described (Gomez-Roman et al., 2006); (Chung 
et al., 2014b). ADCD was assessed via the measurement of complement 
component C3b deposition on the surface of target cells. Ab-dependent 
NK cell degranulation and cytokine/chemokine secretion were measured us- 
ing the CEM-NKr CCR5+ T-lymphoblast cell line pulsed with vaccine-specific 
gp120 (60 mg/ml), as previously described (Chung et al., 2014b). Detailed 
methods of each functional assay are described in the Supplemental 
Information. 

Ab Biophysical Profiling 

The following assays were performed to assess the biophysical profile of 
each of the vaccine Ab samples. Ab affinity for FCGRs was determined using 
surface plasmon resonance as previously described (Chung et al., 2014a), 
while a customized Luminex isotype assay was used to quantify the relative 
concentration of each Ab isotype to a panel of HIV-specific Abs. Detailed 
methods of each of these profiling tools are included in the Supplemental 
Information. 

Identification of Vaccine-Specific Signatures with LASSO and 
PLSDA 

The minimum signature of Ab features and functional parameters useful 
for differentiating vaccine groups were identified using the LASSO method 
(Tibshirani, 1997) and implemented using MATLAB software (version 2014a, 
MathWorks). PLSDA (Arnold et al. , 201 5; Lau et al. , 201 1 ) assessed the predic- 
tive ability of LASSO-selected biomarkers for classifying vaccine groups. 
A detailed description of validation and quality control for this analysis is 
included in the Supplemental Information. 

Network Interactions 

Networks were constructed based on the pairwise correlation coefficients be- 
tween all biophysical features and functional responses. Edges between no- 
des are weighted using significant correlation coefficients, p,y, after correcting 
for multiple comparisons (Benjamini-Hochberg q value < 0.05, testing the hy- 
pothesis of zero correlation) as follows: 

A,y = pfj 

with a -6. 

To assess the significance of the variable groupings observed in the 
network, we calculated the network clustering coefficient for the original 



Figure 6. Defining Novel Signatures of Protection in the RV144 Case:Control Data 

(A) The PLSDA shows the distribution of all case:control data, including all infected and uninfected placebos as well as infected and uninfected vaccinees using 
1 01 humoral features (described in Table S3). LV1 accounted for 68.1 % of all variance, separating most placebos from the vaccines, while LV2 only contributed to 
4.5% of the variance. 

(B) Further insights into the distribution of IgA gp120, IgG VI V2, and lgG3 VI V2 levels were analyzed using histograms demonstrating unique multi-modal 
differences in feature distribution among the infected and uninfected vaccinees. 

(C) The scatterplot, in the central panel, represents the bivariate distribution of IgA gp120 and IgG VI V2 in the vaccines and is framed by the histogram 
distributions for unidimensional reference. The blue and red dash-lined boxes represent quadrants within the data that constitute the fewest cases:controls (low 
risk, blue) or the highest ratio of cases:controls (high risk, red). 

(D and E) LASSO and PLSDA identified nine features that split low- and high-risk profile separation with 97.8% accuracy in cross-validation. Together, LV1 and 
LV2 captured 70.4% of the X variance and 30.1 % of the Y variance, respectively. 

(F and G) Correlation networks were generated for both the low-risk (F) and high-risk (G) groups. 

See also Figures S2 and S3 and Table S3. 
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network and for 100 randomized networks. Random networks are generated 
by randomly swapping edges while preserving the degree of all nodes (de- 
gree-preserving edge shuffle) (Figure S2). 

RV144 Case:Control Study Data Processing 

RV144 case:control study data included results from 281 patients, including 
101 Ab features and functional parameters. Specific features used within 
this analysis are documented in Table S3. Subjects were categorized into 
four groups including: placebo infected, placebo uninfected, vaccine in- 
fected, and vaccine uninfected for all analyses. Because lgG3 and IgG 
VI V2 levels were highly correlated (Figure S3), vaccinees were classified 
based on their IgG VI V2 and IgA levels. A high-risk or low-risk group was 
defined as the region of the IgG VI V2 versus IgA plot that contained the few- 
est cases or the fewest controls, respectively, in a mutually exclusive manner. 
The percentage difference between infected versus uninfected vaccinees 
was defined as 



where, for any given region, r, the percentage of infected people, Ir, over the 
infected population. It, was calculated, as well as for uninfected individuals. 
The enriched region, with the highest P was defined as the high-risk group, 
whereas the region with the lowest P was defined as the low-risk group. 
Fisher’s exact test was used to estimate the significance of the enriched region 
from a null hypothesis. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures, 
three figures, and three tables and can be found with this article online at 
http://dx.d 0 i. 0 rg/l 0. 1 01 6/j.cell.201 5.1 0.027. 
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SUMMARY 

Protein translation typically begins with the recruit- 
ment of the 43S ribosomal complex to the 5' cap 
of mRNAs by a cap-binding complex. However, 
some transcripts are translated in a cap-independent 
manner through poorly understood mechanisms. 
Here, we show that mRNAs containing A/®-methyla- 
denosine (m®A) in their 5' UTR can be translated in 
a cap-independent manner. A single 5' UTR m®A 
directly binds eukaryotic initiation factor 3 (elF3), 
which is sufficient to recruit the 43S complex to 
initiate translation in the absence of the cap-binding 
factor elF4E. Inhibition of adenosine methylation 
selectively reduces translation of mRNAs containing 
5'UTR m®A. Additionally, increased m®A levels in the 
Hsp70 mRNA regulate its cap-independent transla- 
tion following heat shock. Notably, we find that 
diverse cellular stresses induce a transcriptome- 
wide redistribution of m®A, resulting in increased 
numbers of mRNAs with 5' UTR m®A. These data 
show that 5' UTR m®A bypasses 5' cap-binding pro- 
teins to promote translation under stresses. 

INTRODUCTION 

For most cellular mRNAs, the first step of mRNA translation 
involves recognition of the 5' 7-methylguanosine (m^G) cap by 
eukaryotic initiation factor 4E (elF4E), which is a subunit of the 
heterotrimeric elF4F complex. 5' cap-bound elF4F then recruits 
the small (40S) ribosomal subunit associated with various trans- 
lation initiation factors, enabling efficient translation of eukary- 
otic mRNAs. 

However, some mRNAs are translated in a cap-independent 
manner. These capped mRNAs do not require elF4E and are 
translated under basal cellular conditions, as well as conditions 
in which elF4E activity is compromised, such as cellular stress 
states, viral infection, and diseases such as cancer (Stoneley 
and Willis, 2004). Although viral mRNAs can exhibit cap-inde- 
pendent translation due to the presence of highly structured 
internal ribosome entry site (IRES) motifs in the 5' UTR, corre- 



spondingly complex structures are rarely found in eukaryotic 
mRNAs undergoing cap-independent translation (Stoneley and 
Willis, 2004). Thus, the mechanism of cap-independent transla- 
tion in cellular mRNAs remains poorly understood. 

A feature of many eukaryotic mRNAs is A/®-methyladenosine 
(m®A), a reversible base modification seen in the 3' UTR, coding 
sequence, and 5' UTR (Dominissini et al., 2012; Meyer et al., 
201 2). Although the function of m®A in 3'UTRs has been explored 
(Wang et al., 201 4a, 201 4b, 201 5), the function of m®A in 5' UTRs 
remains unknown. Here, we show that m®A in the 5' UTR func- 
tions as an alternative to the 5' cap to stimulate mRNA transla- 
tion. Using both in vitro reconstitution approaches and transla- 
tion assays in cellular lysates deficient in elF4E activity, we 
define a unique translation initiation mechanism that does not 
require the 5' cap. We show that the m®A in the 5' UTR can 
bind eukaryotic initiation factor 3 (elF3). Transcriptome-wide 
ribosome profiling analysis indicates that the translation of 
5' UTR m®A-containing mRNAs is reduced upon depletion of 
the m®A methyltransferase, METTL3, while mRNAs containing 
m®A elsewhere within the transcript fail to show this effect. The 
importance of 5' UTR m®A residues for cellular mRNA translation 
is demonstrated by both ribosome profiling analysis and detec- 
tion of changes to global m®A distribution in 5'UTRs in response 
to cellular stress. Thus, 5' UTR m®A residues are linked to cellular 
stress states and provide a mechanism to bypass the m^G cap 
requirement for mRNA translation, enabling a cap-independent 
mode of translation initiation. 

RESULTS 

Ribosomal Initiation Complexes Assemble on 
m^A-Containing mRNAs Independently of the 
Cap-Binding Protein elF4E 

Although m®A is predominantly localized near stop codons 
and in 3' UTRs in several thousand mRNAs, hundreds of cellular 
mRNAs contain m®A within their 5' UTR (Linder et al., 2015; 
Meyer et al., 2012), and the function of these m®A residues is 
unknown. Since the 5' UTR is important in regulating translation 
initiation, we considered the possibility that 5' UTR-localized 
m®As might influence this process. On most eukaryotic mRNAs, 
translation begins with assembly of a 43S preinitiation complex, 
comprising a 40S ribosomal subunit, a eukaryotic initiation fac- 
tor 2 (elF2)-GTP/Met-tRNAi'^®^ ternary complex, and elFs 3, 1 , 
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and 1A (Jackson et al., 2010). 43S complexes are typically re- 
cruited to mRNA by a cap-binding complex, elF4F. elF4F con- 
sists of three subunits: elF4E, which binds the m^G 5' cap; 
elF4A, an RNA helicase; and elF4G, a scaffold that also binds 
elF3, thereby recruiting the 43S complex. After attachment, 
43S complexes scan to the initiation codon, where they form 
48S initiation complexes (Jackson et al., 2010). 

To investigate the effect of m®A on translation initiation, we 
used toeprinting, an approach for reconstituting assembly of 
48S complexes on mRNA. In toeprinting, ribosomal complexes 
are assembled on mRNA 5' UTRs using purified translational 
components (40S subunits, initiation factors and Met-tRNAj'^®^) 
(Pestova and Kolupaeva, 2002). Formation of the 48S complex 
at the start codon is then monitored by reverse transcriptase- 
mediated extension of a pP]-labeled primer annealed to ribo- 
some-bound mRNA. cDNA synthesis is arrested by the 40S ribo- 
some subunit, yielding characteristic toeprints at its leading 
edge, -1-15-17 nt downstream of the initiation codon. This assay 
can identify the initiation factors and sequence features of 5' 
UTRs that are required for initiation and has been used in mech- 
anistic studies of viral IRESs (Pestova and Hellen, 2003). 

To test the role of m®A in 48S complex formation, we per- 
formed toeprinting with 5'-capped mRNAs comprising the 
54-nt-long p-globin 5' UTR followed by a short coding sequence, 
stop codon, and 3' UTR. Consistent with previous studies (Pes- 
tova and Kolupaeva, 2002), 48S initiation complexes were de- 
tected at the start codon of A-containing mRNA in the presence 
of the complete set of elFs (1 , 1 A, 2, 3, 4A, 4B, and 4F), and omis- 
sion of group 4 elFs nearly abrogated 48S complex formation 
(Figure 1A, compare lanes 2 and 4). This is consistent with the 
known role for the elF4 cap-binding complex in recruiting the 
43S complex to mRNA (Gingras et al., 1999). 

When we used mRNAs that were in vitro transcribed to 
contain m®A, we found that 48S complexes readily assembled 
after addition of the complete set of elFs, as was seen with un- 
methylated mRNA. However, unlike the unmethylated mRNA, 
48S complexes formed on m®A-containing mRNA even in the 
absence of group 4 elFs (Figure 1 A). Thus, initiation on m®A-con- 
taining mRNA is distinct from initiation on mRNA lacking m®A and 
does not require the elF4 cap-binding complex. 

To further establish the factor requirements for initiation on 
m®A-containing mRNA, we selectively omitted each initiation 
factor and performed toeprinting. These experiments show that 
efficient initiation on m®A-containing mRNA only requires the 
presence of elFs1, 1A, 2, 3, and the 40S subunit (Figures 1B 
and 1C). 48S complexes that formed on m®A-containing mRNA 
in the absence of group 4 elFs were functional, as addition of 
the 60S ribosomal subunit, 2aa-tRNAs, and factors required for 
subunit joining and elongation (elF5, elF5B, eEF2, and eEF1H) 
resulted in formation of 80S ribosomes that underwent efficient 
elongation and yielded pre-termination complexes at the stop 
codon (Figure 1B). Thus, translation-competent 48S complexes 
can form on m®A-containing mRNA in the absence of elF4E. 

m^A Enables Translation in a 5 ' Cap-Independent 
Manner in Cell-Free Extracts 

We next asked if m®A induces elF4E-independent translation in 
cell-free extracts. To investigate this, we used a HeLa extract 



that has low elF4E activity (Mikami et al., 2006) (Figures S1A 
and S1B) and thus provides an ideal system for studying 
elF4E-independent translation. Indeed, addition of a capped, 
nonmethylated luciferase-encoding mRNA containing the 
p-globin 5' UTR to the HeLa extract did not produce measure- 
able luciferase activity unless elF4E was added (Figure 2A). 
Thus, cap-dependent translation in this extract is dependent 
on exogenous elF4E. 

We next used HeLa extracts to determine if transcripts contain- 
ing m®A require elF4E. In contrast to the mRNA containing exclu- 
sively A, 5'-capped mRNA containing 50% m®A was readily trans- 
lated even in the absence of added elF4E (Figure 2A). Furthermore, 
addition of 1 mM m^GpppG, a cap analog that sequesters 
cap-binding proteins (Ray et al., 2006), abolished translation of 
5'-capped, A-containing mRNA but had no effect on m®A-contain- 
ing mRNA (Figure 2B). Lastly, A-containing mRNA synthesized 
without a cap was not translated, whereas m®A-containing, un- 
capped mRNA was readily translated (Figure 2C). The increased 
translation of m®A-containing mRNA in these experiments was 
not due to increased stability of m®A-containing mRNA, as 
RT-qPCR and radiolabeled mRNA stability measurements indi- 
cated similar levels of A- and m®A-containing luciferase mRNA 
after incubation with HeLa extracts (Figures S1C-S1F). Collec- 
tively, these data indicate that translation of m®A-containing 
mRNA exhibits marked independence of the 5' cap and elF4E. 

A Single m^A Is Sufficient to Induce Cap-Independent 
Translation 

Since the mRNAs used in the in vitro translation assays have m®A 
throughout the transcript, it is unclear if the translational effects 
are due to m®A in the 5' UTR or elsewhere in the mRNA. To deter- 
mine the contributions of specific m®A residues to cap-indepen- 
dent translation, we examined mRNAs that only contain m®A in 
the coding sequence. Uncapped, luciferase-encoding mRNAs 
that contained zero m®A residues within the 5' UTR showed no 
translation, indicating that m®A residues in the coding sequence 
are unable to induce cap-independent translation (Figure 2D). 
However, addition of a single m®A residue at the beginning, mid- 
dle, or end of the 5' UTR was sufficient to markedly induce cap- 
independent translation (Figure 2D). 

To determine if a single 5' UTR m®A residue can promote cap- 
independent translation, we used uncapped luciferase-encod- 
ing mRNAs that contain m®A as the first transcribed nucleotide 
(Supplemental Experimental Procedures). This mRNA contains 
a single m®A residue in the 5' UTR, and the remainder of the 
As within the transcript are unmethylated. For mRNAs lacking 
m®A, negligible luciferase synthesis was detected (Figure 2E). 
However, transcripts containing a single 5' m®A were readily 
translated (Figure 2E). Notably, the level of translation induced 
by a 5' m®A is less than the translation induced by a single 
m®A residue located internally within the 5' UTR, which likely 
reflects inefficient incorporation of m®A at the first position of 
5' m®A-containing transcripts (see Experimental Procedures). 
Collectively, these experiments indicate that a single m®A can 
induce cap-independent translation. 

To determine whether m®A-mediated cap-independent trans- 
lation is a specific effect caused by the presence of m®A, 
we synthesized uncapped luciferase transcripts containing A, 
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Figure 1. S^UTR m^A Enables Ribosome Binding to mRNA in the Absence of Cap-Binding Proteins 

(A) 5' UTR methylation permits 48S initiation compiex formation in the absence of the group 4 eiFs. in vitro transcribed, capped mRNAs encoding a MVHC 
tetrapeptide and containing either A or m®A were incubated with purified mammaiian transiation initiation components. Subsequent toeprinting anaiysis using a 
radioiabeied primer then reveaied whether 48S initiation compiexes were formed. Positions of the initiation codon, fuii-iength cDNA, and the 48S compiex are 
shown on the sides of the panei. Lanes C/T/A/G depict the corresponding DNA sequence. When unmethyiated mRNA is used (ianes 1 -5), 48S compiexes are oniy 
formed when the cap-binding compiex eiF4F is present (ianes 2 and 3). When eiF4F is absent, 48S compiex formation on unmethyiated mRNA is impaired (ianes 4 
and 5). However, when mRNA with m®A in the 5' UTR is used, 48S compiex formation is observed even in the absence of eiF4F (ianes 9 and 1 0; compare to ianes 7 
and 8 where eiF4F is present). 

(B) eiFsl , 1 A, and 3 are required for efficient m®A-induced cap-independent 48S compiex formation. Toeprinting assays were performed as in (A) using A- or m®A- 
containing mRNAs and in the presence of various transiation initiation components as indicated. m®A-containing mRNA exhibits robust 48S compiex assembiy in 
the absence of eiF4F, whereas A-containing mRNA does not (compare ianes 1 and 7). Efficient m®A-mediated 48S compiex assembiy is aiso dependent on the 
presence of elFsl and 1 A, which is consistent with the known roies of these proteins in promoting scanning and AUG recognition (compare ianes 1 with ianes 2, 4, 
and 5). Removai of eiF3 aiso aboiishes 48S compiex assembiy on m®A-containing mRNA (compare ianes 1 and 2), indicating that eiF3 is required for m®A- 
mediated 48S compiex formation. Addition of 60S subunits, eiF5, eiF5B, eEF1 H, eEF2, and aa-tRNAs resuited in the appearance of toeprints corresponding to 
pre-termination compiexes at the stop codon, indicating that m®A-recruited 48S compiexes are fuiiy functionai (iane 6). 

(C) Omission of eiF2 from toeprinting assays resuits in the absence of 48S compiexes (compare ianes 2 and 3), indicating that eiF2 is required for 48S compiex 
assembiy on m®A-containing mRNA. 



m®A, or other modified nucleotides, such as A/^-methyladeno- 
sine, 2'-0-methyladenosine, pseudouridine, and 5-methylcyto- 
sine. In each case, there was negligible luciferase synthesis 
unless m®A was present (Figure S1G). 

We next asked if the effect of m®A reflects impaired base pair- 
ing caused by modification of the A/® position (Roost et al., 2015). 



However, mRNA containing A/®-propargyladenosine, which 
contains a slightly larger modification compared to a methyl 
group at the A/® position, failed to undergo cap-independent 
translation (Figure S1G). Thus, m®A-induced structural changes 
are unlikely to account for the cap independence conferred 
by m®A. 
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Figure 2. m^A within the 5^ UTR Enables Cap-Independent Translation of mRNA 

(A) 5' UTR m®A permits mRNA translation without the need for the cap-binding protein elF4E. In vitro translation was performed using a HeLa cell extract mixed 
with luciferase-encoding, capped mRNA containing either A or m®A. Protein production was measured by quantifying luciferase activity. Cap-dependent 
translation is observed from both methylated and unmethylated mRNAs in the presence of elF4E. However, when elF4E is absent, only the m®A-containing mRNA 
is translated (n = 4; mean ± SD; ***p < 0.0001). 

(B) Presence of a 5' cap analog is unable to abolish m®A-induced mRNA translation. Luciferase mRNAs were translated as in (A). 1 mM free cap analog 
(m7GpppG) was added to sequester cap-binding proteins. Addition of m7GpppG abolishes cap-dependent translation of unmethylated mRNA (left) but is unable 
to abolish the cap-independent translation induced by m®A (right). Levels of luciferase activity are shown relative to capped mRNA +10 pmole elF4E (n = 3; 
mean ± SD; *p < 0.01 , **p < 0.001). 

(C) In vitro translation was performed using luciferase-encoding mRNA containing A or 50% m®A and with or without a 5'cap as indicated. While unmethylated, 
capped mRNA + 10 pmole elF4E is robustly translated, the unmethylated, uncapped mRNA fails to be translated. However, m®A-containing mRNA is efficiently 
translated even when no 5' cap is present (n = 3; mean ± SD; *p < 0.01). 

(D) m®A residues in the coding sequence do not induce cap-independent translation. Uncapped, luciferase-encoding mRNAs containing either the natural 
p-globin 5' UTR or a modified (3-globin 5' UTR containing either zero, one, or three A residues as indicated were used for in vitro translation assays. Translation of 
m®A-containing mRNA with zero A residues in the 5' UTR was markedly diminished, indicating that coding sequence m®A residues are unable to induce cap- 
independent translation. However, when a single m®A was added to the 5' UTR, the transcripts were robustly translated. Methylated 5' UTRs with a single A near 
the 5' end, the middle (mid), or near the 3' end all showed similar levels of translation (n = 3; mean ± SD; **p < 0.001 , ***p < 0.0001 , ****p < 0.00001). Schematic 
shows the distribution of A residues within each p-globin 5' UTR variant (the unmodified p-globin 5' UTR contains 17 A residues). 

(E) mRNA with a single m®A within the 5' UTR and no m®As in the remainder of the transcript induces cap-independent translation. Uncapped, luciferase- 
encoding mRNAs, which contained either a single adenosine 5'-monophosphate (AMP) or A/®-methyladenosine 5'-monophosphate (m®AMP) at the 5' end, were 
used for in vitro translation. Only the m®A-containing mRNA was translated, demonstrating that a single 5' end m®A residue is capable of inducing cap-inde- 
pendent translation (n = 3; mean ± SD; **p < 0.001). The reduced translation efficiency of this mRNA compared to mRNAs with internally methylated 5' UTRs is 
likely due to inefficient incorporation of m®A residues at the 5' end by T7 RNA polymerase. 

See also Figure SI . 



m^A-Induced Translation Initiation Occurs through a 
5 ' End-Dependent Mechanism 

Our results indicate that m®A residues within the 5' UTR are 
capable of promoting cap-independent translation. However, 
the majority of m®A residues are found in the coding sequence 



and 3' UTR (Meyer et al., 201 2). We therefore asked if these inter- 
nal m®A residues can induce internal ribosome entry. To test this, 
we synthesized an m®A-containing p-globin mRNA in which the 
wild-type AUG initiation codon was removed and two new AUG 
triplets were introduced upstream and downstream of the native 
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Figure 3. m®A-Mediated T ranslation Occurs 
through a 5^ End-Dependent Mechanism 

(A) Toeprinting assays were performed using a 
capped, m®A-containing mRNA containing the 
(3-giobin 5' UTR sequence, which was modified to 
inciude two AUG initiation codons (“AUG1” and 
“AUG2” in the schematic). The majority of 48S 
compiexes were assembied at AUG1 , with negii- 
gibie ieveis of 48S compiexes detected at AUG2. 

(B) Uncapped, A-, or m®A-containing mRNAs 
encoding GFP were used for in vitro transiation. 
The mRNA contains two near-kozak start codons: 
AUG 1 encodes the fuii-iength GFP protein, and 
internaiiy iocaiized AUG2 encodes an in-frame 
truncated (~17 kDa) protein comprising the 
C-terminai portion of GFP. Fuii-iength and trun- 
cated GFP protein ieveis (sizes indicated by ar- 
rows) were measured by western biot. m®A pri- 
mariiy promotes transiation of the fuii-iength 
protein and faiis to induce internai entry-mediated 
transiation from AUG2. Leveis of the ribosomai 
protein RPS6 are shown as a ioading controi. 

(C) Quantification of fuii-iength GFP protein ieveis 
in (B) shows increased protein expression of 
methyiated mRNA versus unmethyiated mRNA 
(n = 3; mean ± SD; **p < 0.001). 

(D) The presence of a stabie hairpin at the beginning 
of the 5' UTR to biock 5' end entry severeiy atten- 
uates m®A-mediated transiation (n = 3; mean ± SD; 

*p<0.01). 

See aiso Figure SI. 



position (Figure 3A). When this mRNA was incubated with 40S, 
elFs1/1 A/2/3, and Met-tRNAj'^®\ 48S complexes occurred 
almost exclusively at the first AUG, with very low levels of detect- 
able 48S complex formation at the downstream AUG (Figure 3A). 
These data suggest that m®A preferentially induces translation at 
the first suitable start codon in the mRNA as opposed to promot- 
ing translation through an internal entry-based mechanism. 

Next, we used HeLa cell lysates to in vitro translate a GFP 
reporter mRNA containing an internal near-Kozak AUG in addi- 
tion to the natural AUG encoding full-length GFP. Flowever, we 
failed to observe m®A-mediated translation of the ~17 kDa prod- 
uct produced from the internal AUG and instead observed robust 
translation of the full-length protein produced from the first AUG 
(Figures 3B, 3C, and S1 FI). These results are consistent with the 
toeprinting experiments and suggest that m®A preferentially in- 
duces translation at the first acceptable start codon. 

The selective use of the first AUG for translation initiation 
suggests a model of m®A-mediated initiation that involves a 5' 
end-dependent scanning mechanism as opposed to internal 
ribosomai entry. A similar mode of initiation, which is also cap 
independent but shows 5'-end dependence, was recently 
described for mRNA containing in its 5' UTR an elF4G-binding 
viral IRES-domain (Terenin et al., 2013). Additionally, cap-inde- 
pendent, 5' end-dependent mechanisms of translation initiation 
have previously been observed in assays using rabbit reticulo- 
cyte lysates (De Gregorio et al., 1998). To test directly whether 
m®A promotes entry through the 5' end, we used an uncapped, 
luciferase-encoding mRNA that contains a stable hairpin at the 
extreme 5' end of the mRNA to block 5' end-dependent ribo- 
some entry. We found that the presence of this hairpin markedly 



reduced the robust translation of m®A-containing mRNA that is 
normally observed (Figure 3D). Thus, m®A-mediated initiation re- 
quires an accessible 5'-terminal end on the mRNA. Taken 
together, these data indicate that 5' UTR m®As are distinct 
from classical viral IRES elements since m®A promotes recruit- 
ment of ribosomai preinitiation complexes to the 5' end of 
mRNA, rather than enabling internal ribosome entry. 

elF3 Selectively Binds m^A-Containing RNA 

We next asked how m®A is recognized to induce translation of 
mRNAs. The in vitro 48S reconstitution assays showed that 
recruitment of the 43S preinitiation complex to m®A-containing 
mRNA only requires elFs 1, 1A, 2, and 3 and the 40S subunit. 
Thus, one of these components binds m®A. 

To test which of these factors interacts with m®A, we used an 
m®A crosslinking assay in which a pP]-labeled RNA probe 
containing a single A or m®A in its naturally occurring GAC 
context was UV-crosslinked to each translational component. 
Crosslinked proteins were then detected by SDS-PAGE and 
autoradiography. 

elFs 1 , 1 A, and 2 and the 40S subunit showed equal levels of 
crosslinking to the A- and m®A-containing probes (Figures 4A 
and S2A). However, crosslinking of elF3 to the m®A-containing 
probe was substantially increased compared to the A-containing 
probe, suggesting that this factor constitutes the major m®A- 
binding activity of the 43S complex (Figures 4A, S2B, and S2C). 

The preferential binding of elF3 to m®A was not affected by 
changing the position of the m®A together with its context nucle- 
otides within the probe (Figure S2D). However, when the natural 
nucleotide context of m®A was changed from GAC to UAC or 
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Figure 4. The 43S Complex Component 
elF3 Binds m®A 

(A) Indicated proteins/protein complexes were 
incubated with radiolabeled A- or m®A-containing 
RNA probes and crosslinked. Unbound RNAs 
were then removed with RNase I, proteins were 
separated by SDS-PAGE, and radioactively- 
labeled RNAs were detected. elF1, elF1A, elF2, 
and the 40S ribosomal subunit show no prefer- 
ential crosslinking to methylated RNA. However, 
elF3 preparations exhibit strong crosslinking to 
methylated RNA at bands around 60 kD, 80 kD, 
and 110-160 kD, which correspond to multiple 
subunits of the elF3 complex as indicated. 

(B) Crosslinking assays were performed as in (A) 
using the HeLa cell extracts utilized in in vitro 
translation assays. The elF3 complex was immu- 
noprecipitated using antibodies against elF3a or 
elF3b, and proteins containing crosslinked RNA 
were detected. Both elF3 antibodies precipitated 
proteins that preferentially crosslinked to m®A 
RNA. Immunoprecipitation using rabbit and 
mouse IgG control antibodies are shown as 
negative controls. Western blotting for the indi- 
cated proteins indicates their enrichment following 
immunoprecipitation (bottom). The input lanes 
throughout have 25% of the material loaded for 
the IP lanes. 

See also Figures S2 and S3. 



CAG, the m®A-containing probe showed significantly reduced 
crosslinking to elF3 (Figure S2E). Thus, efficient elF3 crosslinking 
to m®A-containing RNA occurs when the probe contains m®A 
within its natural sequence context. Furthermore, when we sub- 
jected mRNAs that contained a single m®A residue within their 5' 
UTR to in vitro translation, we found that m®A residues in a GAC 
context promoted robust cap-independent translation, whereas 
m®As in a UAC or CAG exhibited markedly reduced translation 
(Figure S2F). These data indicate that elF3 preferentially binds 
to m®A residues in their natural sequence context to promote 
cap-independent translation. 

elF3 is a large multiprotein complex comprising 13 subunits 
(a-m) (des Georges et al., 2015) that interacts with mRNA in 
48S complexes (Pisarev et al., 2008). UV-crosslinking studies 
showed that the interaction between elF3 and RNA occurs at a 
multisubunit interface (Lee et al., 2015). Similarly, in our cross- 
linking assays, the m®A-containing probe induced strong label- 
ing of several protein bands, ranging in molecular weight from 
~60 to ~160 kDa (Figures 4A and S2A-S2E). Particularly strong 
labeling was observed in the area of AelF3a/elF3c, AelF3c, and 
elF3d/elF3l (Figures S2A-S2E). These data suggest that m®A- 
containing RNA may interact with a multisubunit interface within 
elF3. 

To further explore the binding of m®A-containing RNA to elF3, 
we used HeLa cell lysates. Crosslinking using a radioactive 
m®A-containing RNA probe resulted in the labeling of specific 
protein bands that were increased relative to the A-containing 
probe (Figure 4B). Immunoprecipitation of crosslinked extracts 
using either of two elF3 subunit-specific antibodies selectively 



precipitated these bands, confirming that the increased binding 
to m®A-containing RNA was mediated by elF3. Immunoprecipi- 
tation with a control antibody recognizing a different initiation- 
factor-associated protein (ABCF1) did not precipitate these 
bands (Figures 4B, S3A, and S3B). Thus, these data further sug- 
gest that m®A-containing RNA interacts with elF3. 

The m®A-binding protein YTHDF1 interacts with a diverse set 
of proteins, including elF3 (Wang et al., 2015). Thus, we consid- 
ered the possibility that recruitment of elF3 to m®A-containing 
RNA in the in vitro translation and crosslinking assays is medi- 
ated by a YTH-family m®A-binding protein. However, silver stain- 
ing of all the initiation factors used in the toeprinting assays failed 
to show protein bands in the ~60-64 kD range of these proteins 
(Figure S2). Additionally, mass spectrometry analysis of the 
purified elF3 did not reveal YTH family proteins (Figure S3C) 
(des Georges et al., 2015). Finally, YTHDF1 was not present in 
the highly purified elF3 preparations used in our crosslinking 
assays, nor were any of the related YTH-domain containing fam- 
ily of m®A binding proteins (Figure S3D) (des Georges et al., 
2015). Thus, these data support the idea that elF3 is able to 
directly bind m®A. 

To determine whether elF3 binds m®A in cells, we performed 
PAR-iCLIP to identify zero-distance binding sites of elF3 in 
cellular mRNAs. elF3a-binding sites were primarily localized to 
5' UTRs of mRNAs and showed a high degree of overlap with 
elF3-binding sites reported previously (Lee et al., 2015) (Figures 
S4A and S4B). 

To determine whether elF3a binds to sites of m®A in 5' UTRs, 
we evaluated the overlap of elF3a-binding sites with m®A 
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residues mapped at single-nucleotide-resolution in 5' UTRs 
(Linder et al., 2015). To test this, we used a permutation-based 
approach in which elFSa-binding sites were randomized 
while preserving the distribution and positional bias of elFSa 
PAR-iCLIP tags in 5' UTRs. Multiple permutations (n > 100) 
were used, and the statistical significance of overlap between 
elF3 PAR-iCLIP sites and m®A residues was evaluated (Supple- 
mental Experimental Procedures). We found a statistically signif- 
icant overlap between m®A residues and elF3-binding sites in 
5' UTRs, with 35% of 5' UTR m®A residues overlapping with 
elF3 sites (Figures S4C-S4E). Since single-nucleotide-resolution 
m®A mapping distinguishes between m®A residues and the 
m®Am residues that exist as part of the 5' cap in some mRNAs 
(Kruse et al., 201 1 ; Linder et al., 201 5), we were able to determine 
that this overlap was specific to m®A residues within 5' UTRs 
(Figures S4C and S5A). Taken together, these results support 
the idea that elF3 is associated with m®A residues in the 
5' UTRs of cellular mRNAs. 

To further test the physiological association of elF3 and m®A 
predicted by the PAR-iCLIP analysis, we performed elF3 
protein/RNA immunoprecipitation from HEK293 cells expressing 
the m®A-demethylating enzyme (Jia et al., 2011), Fto. The 
abundance of target mRNA 5' UTRs in the elF3-bound fraction 
was then measured using RT-qPCR with primers that amplify 
the 5' UTR regions containing the m®A residue. mRNAs that 
contain a high stoichiometry m®A site within their 5' UTR (Meyer 
et al., 2012) were substantially depleted in the elF3-bound 
fraction following Fto overexpression (Figure 5B). In contrast, 
elF3 immunoprecipitation of a control mRNA deficient in 
5' UTR m®A (Meyer et al., 2012) was unaffected by Fto overex- 
pression (Figure 5B). Taken together, these data support the 
idea that elF3 interacts with mRNAs in an m®A-dependent 
manner in cells. 

m^A within the 5' UTR Promotes Translation of Cellular 
mRNAs 

To address whether mRNAs that contain 5' UTR m®A residues 
possess enhanced translation in cells, we examined ribosome 
profiling-based measurements of mRNA translation efficiency 
(TE) in HeLa cells depleted of the m®A methyltransferase 
enzyme, METTL3, which results in depletion of all m®A residues 
in cells (Wang et al., 201 5). We examined the TE of mRNAs based 
on the location of their m®A residues identified by single-nucleo- 
tide-resolution m®A mapping (Linder et al., 2015). Compared to 
mRNAs that lack m®A, we found that transcripts that contain 
m®A residues within the coding sequence or 3' UTR show no 
significant change in TE in METTL3-depleted cells (Figures 6A 
and 6B). Similarly, mRNAs that contain m®A residues near the 
stop codon do not show reduced translation in METTL3- 
depleted cells. Flowever, mRNAs containing 5' UTR m®A resi- 
dues showed a large reduction in TE following METTL3 deple- 
tion, suggesting a preferential role for 5' UTR m®A in promoting 
mRNA translation (Figures 6A, 6B, and S5B). Residual translation 
may reflect ongoing cap-dependent translation in METTL3-defi- 
cient cells. The translation of mRNAs containing 5' UTR m®A res- 
idues was not suppressed in cells depleted of YTFIDF1 (Fig- 
ure S5B), which is consistent with the idea that 5' UTR m®A 
promotes translation through elF3. Taken together, these data 



suggest that m®A residues in the 5' UTR enhance the translation 
of mRNAs in cells. 

Heat-Shock-Induced Translation of Hsp70 Is Mediated 
by 5 UTR m®A 

We next sought to investigate the role of m®A in promoting cap- 
independent translation in cells. Since cellular translation in- 
volves both cap-dependent and cap-independent mechanisms, 
we took advantage of heat shock, which induces a stress 
response that suppresses most cap-dependent translation (Hol- 
cik and Sonenberg, 2005). Heat-shock protein 70 (HSP70) is a 
stress response mRNA known to undergo increased transcrip- 
tion and cap-independent translation following heat shock (Lind- 
quist and Craig, 1988). Previous studies demonstrated that 
HSP70 contains an m®A site within its 5' UTR (Schwartz et al., 
2014) and that methylation of the HSP70 5' UTR is increased 
following heat shock (Dominissini et al., 2012). Flowever, the 
role of m®A in cap-independent translation of HSP70 is not 
understood. 

To test the effect of m®A in HSP70 translation, we utilized 
altered expression of Fto to influence m®A levels within the 
Hsp70 5' UTR. Knockdown of Fto resulted in increased m®A 
levels in Hsp70 mRNA in heat-shocked cells (Figure S6A). 
Conversely, overexpressing Fto in heat-shocked cells reduced 
the level of m®A in Hsp70 mRNA by 29% relative to heat-shocked 
cells overexpressing GFP (Figure S6A). To determine whether 
altered m®A levels in the Hsp70 5' UTR influence heat shock- 
induced Hsp70 translation, we used mouse embryonic fibro- 
blasts (MEFs), which exhibit low Hsp70 levels prior to heat shock 
(Sun et al., 2011). In MEF cells stably expressing control shRNA, 
Hsp70 protein was readily detected 4 and 6 hr after heat shock. 
However, in MEF cells stably expressing Ffo-specific shRNA to 
increase m®A levels, Hsp70 protein expression was significantly 
higher at both 4 and 6 hr after heat shock (Figure 6C). This effect 
was not due to increased levels of Hsp70 mRNA (Figure S6B). 
Furthermore, knockdown of Fto caused a significant increase 
in the fraction of polysome-bound Hsp70 mRNA (Figure 6D), 
suggesting that the increased levels of Hsp70 protein seen after 
heat shock reflect increased translation of Hsp70 mRNA in Fto 
knockdown cells. 

Consistent with the effects of Ffo knockdown on Hsp70 levels, 
Fto overexpression caused significantly reduced Hsp70 protein 
production 4 and 6 hr after heat shock (Figure 6E). This effect 
was not due to reduced Hsp70 transcript levels (Figure S6B). 
In addition, Hsp70 mRNA was significantly reduced in the poly- 
some fractions of Fto-overexpressing cells compared to GFP- 
expressing cells, confirming that the Fto-mediated reduction in 
Hsp70 protein levels was due to reduced Hsp70 translation (Fig- 
ure 6F). These data suggest that the loss of m®A in Hsp70 mRNA 
results in reduced translation efficiency following heat shock. 

Transcriptome-wide Redistribution of m^A following 
Cellular Stress 

We next sought to further understand the importance of 5' UTR 
m®A residues in response to cellular stress. Based on our find- 
ings with Hsp70 mRNA, we considered the possibility that heat 
shock may alter the transcriptome-wide distribution of m®A. 
Under basal conditions, most m®A residues are located in 
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Figure 5. elF3 Binding Sites within Cellular mRNAs Localize to Sites of m^A Residues within the 5' UTR 

(A) Shown are read clusters from both elF3 PAR-iCLIP (light blue) and single-nucleotide-resolution m®A mapping (Linder et al., 2015) (miCLIP; red) for four 
representative mRNAs {EIF4A3, H3F3C, SOLE, and IER5). elF3a PAR-iCLIP read clusters exhibit highly specific overlap with m®A mapping clusters at internal 
positions within 5' UTRs. This co-localization is specific to 5' UTRs, as mRNAs that contain multiple m®A residues in the CDS or 3' UTR fail to show elF3a binding 
at these sites (exemplified by IER5). Red asterisks indicate the location of individual m®A sites identified at single-nucleotide resolution. 

(B) elF3 binds to the 5' UTR of cellular mRNAs in an m®A-dependent manner. HEK293 cells were transfected with GFP- or Fto-overexpression plasmids, and elF3 
immunoprecipitation was performed to isolate elF3-bound mRNAs. Bound mRNAs were quantified by RT-qPCR using 5' UTR-specific primers. 5' UTRs of 
mRNAs that contain high levels of m®A exhibited reduced binding to elF3 after overexpression of Fto. 5' UTRs that do not contain m®A exhibited no change in elF3 
binding following Fto overexpression (n = 3; mean ± SEM). 

See also Figures S4 and S5. 



mRNAs near the stop codon, with markedly fewer m®A residues 
in 5' UTRs. To determine if cellular stress alters the characteristic 
distribution of m®A, we mapped m®A residues using miCLIP, a 
method for single-nucleotide resolution detection of m®A sites 
(Linder et al., 201 5). Remarkably, the metagene analysis showed 
a marked enrichment of m®A in the 5' UTR in heat-shocked cells 
compared to control cells (Figure S6C). 

To further examine this phenomenon, we analyzed existing 
transcriptome-wide m®A mapping datasets that were performed 
in stressed cells and control cells. These include HepG2 cells 
treated with UV, interferon-y, and heat shock (Dominissini 
et al., 2012). Metagene analyses showed prominent increases 
in the level of 5' UTR m®A in both the UV-treated and heat- 
shocked cells (Figure S6D). The number of m®A sites in the 



3' UTR was relatively unaffected following heat shock or UV 
compared to control (n = 4,538, 4,533, or 3,171, respectively), 
whereas the number of m®A sites in the 5' UTR was markedly 
increased in heat shock and UV relative to control (n = 1 ,501 , 
1,212, or 326, respectively) (Table S1). Notably, interferon-y 
treatment did not alter the m®A metagene profile (Figure S6D), 
indicating that the induction of 5' UTR m®A is not a nonspecific 
stress response but instead is linked to specific forms of cellular 
stress. 

Intriguingly, both heat shock and UV caused increased 5' UTR 
methylation in mRNAs that belong to common functional 
pathways, including phosphorylation and cell-cycle regulation 
(Table S1). Collectively, our results indicate that activation of 
some stress-response pathways causes a global reshaping of 
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the cellular mRNA methylome and suggest that increased 5' UTR 
methylation may be a general component of the response to 
select cellular stresses. Future studies will be important for 
understanding how stress pathways increase m®A within the 
5' UTR of mRNAs and reshape the RNA methylome. Further- 
more, it will be important to analyze how diverse stress response 
pathways utilize these upregulated 5' UTR m®A residues to 
mediate translational responses. 

DISCUSSION 

Eukaryotic mRNAs can be translated in both cap-dependent and 
cap-independent modes, although the mechanisms of transla- 
tion initiation that do not require the 5' cap and elF4E have 
been poorly understood. Our results show that m®A residues 
within the 5' UTR can act as an m®A-induced ribosome engage- 
ment site (MIRES), which promotes cap-independent translation 
of mRNA. We find that a single m®A in the 5' UTR of mRNAs is 
sufficient to promote MIRES activity in cell-free extracts, 
whereas m®A residues outside the 5' UTR fail to show this effect. 
The significance of 5' UTR m®A residues is further seen in both 
ribosome profiling datasets and in individual cellular mRNAs in 
conditions where cap-dependent translation is suppressed. 
These results point to selective recognition of 5' UTR m®A as a 
mechanism for mRNAs to bypass the cap requirement for trans- 
lation and suggest a potential role for this class of m®A residues 
in mediating translational responses induced in diverse cellular 
stress states. 

A role for m®A in promoting translation initiation is supported 
by our finding that METTL3 depletion leads to a large reduction 
in translation efficiency of mRNAs containing 5' UTR m®A resi- 
dues compared to mRNAs that contain m®As elsewhere. 
Although cap-independent translation of cellular mRNAs may 
also be mediated by m®A-independent pathways, including 
direct recruitment of ribosomes to internal 5' sequence or struc- 
tural elements (Xue et al., 2015), our studies raise the intriguing 
possibility that an elF4E-independent mode of translation initia- 
tion can be switched on or off by reversible methylation of aden- 
osine residues in the 5' UTR of mRNAs. 

Our studies show that cap-independent translation mediated 
by m®A requires a novel m®A reader, elF3. We find that many 
elF3-binding sites in the transcriptome overlap with m®A sites 
in 5' UTRs. The identification of elF3 as an m®A reader was orig- 
inally suggested by the finding that the 48S complex can be 
assembled on m®A-containing RNA using only elF1, elF1A, 
elF2, elF3, and the 40S subunit. Of these components, elF3 
shows selective interaction with m®A both in vitro and in cells. 
By binding elF3, 5' UTR m®A residues can stimulate translation 
initiation by directly recruiting the 43S preinitiation complex to 
the 5' UTR of mRNAs. 

m®A has diverse effects on mRNAs, including mRNA destabili- 
zation and translational enhancement, although these effects are 
predominantly attributed to m®A near stop codons or in 3' UTRs 
(Wang et al., 2014a, 2015). In the case of m®A near stop codons 
or in 3' UTRs, translational enhancement is mediated by YTHDF1 , 
which binds to select transcripts at m®A sites in their 3' UTRs 
(Wang et al., 2015). YTHDF1 binds numerous proteins, including 
elF3 and other ribosome-associated proteins, which are proposed 



to be recruited to 3' UTRs to influence cap-dependent translation 
initiation (Wang et al., 201 5). This is in contrast to the mechanism of 
5' UTR m®A, which directly recruits elF3 and assembles translation 
initiation complexes in the 5' UTR without cap-binding proteins. 
Our analysis of ribosome profiling data from YTHDF1 -depleted 
cells further indicates that 5' UTR m®A residues promote transla- 
tion through a YTHDF1 -independent mechanism. Thus, m®A ex- 
hibits markedly distinct effects on mRNA based on its location in 
transcripts. 

A long-standing question is the mechanism by which select 
cellular mRNAs undergo cap-independent translation during 
conditions where cap-dependent translation is suppressed (Flol- 
cik and Sonenberg, 2005). A prevailing hypothesis has been that 
these mRNAs contain cellular IRESs that promote cap-indepen- 
dent translation (Komar and Hatzoglou, 2011). However, putative 
cellular IRESs often lack the complex structural elements seen 
in viral IRESs (Hellen and Sarnow, 2001). As a result of this 
discrepancy, and because of flaws inherent to many assays 
that test cellular IRES function, the evidence for and against 
cellular IRESs is a frequent topic of debate (Gilbert, 2010; 
Kozak, 2005). Given the prevalence of m®A within 5' UTRs, 
their translation-promoting activity represents an additional or 
perhaps alternative mechanism for mediating cap-independent 
translation. 

The importance of 5' UTR m®A residues is supported by their 
selective upregulation in response to specific forms of stress. 
This m®A stress response points to the importance of this subset 
of m®A residues, which our results show are linked to cap-inde- 
pendent translation. Notably, other forms of stress regulate 
translation through the integrated stress response (Ron, 2002). 
It will be important to determine if 5' UTR m®A-mediated transla- 
tion is an alternative mechanism to orchestrate translational 
responses to stress. 

EXPERIMENTAL PROCEDURES 
In Vitro Translation 

In vitro translation assays were performed using HeLa cell extracts (One-Step 
Human IVT Kit, Thermo Scientific). Equal amounts of RNA were used for each 
reaction (100 ng RNA per reaction, -^30 nM per reaction), and all reactions 
within each experiment were performed in equal volumes. Multiple different 
batches of HeLa extracts and mRNA preparations were used to ensure that 
the translation-promoting effect of m®A is not due to a specific lot of extract 
or batch of synthesized mRNA. However, this also contributes to inter-experi- 
ment variability. Reactions were performed at 30°C for 30 min and were 
stopped by the addition of 200 i^M cycloheximide and placed on ice. 1 i^l of 
each reaction was then used for luminescence analysis (see below). The 
remaining reaction volume was used for RNA isolation with TRIzol (Invitrogen) 
or QIAGEN RNeasy kits according to the manufacturer’s instructions. cDNA 
synthesis was then performed using Superscript III reverse transcriptase 
(Invitrogen) and random hexamers. Following treatment with RNase H, cDNA 
was then used for RT-qPCR analysis to ensure that differences in mRNA levels 
across samples did not account for the observed changes in protein produc- 
tion. Statistical analysis of luciferase activity measurements (below) was per- 
formed using Student’s t test and a p value threshold of 0.01 . 

Luciferase Activity Measurements 

Luciferase expression was measured using the One-Glo luciferase assay kit 
(Promega) according to the manufacturer’s instructions. Luminescence 
measurements were performed on a Molecular Devices Spectramax L micro- 
plate reader using the SoftMax Pro software program. 
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Figure 6. m®A Mediates Stress-Induced Translation of Hsp70 

(A) Depletion of the m®A methyltransferase, METTL3, decreases the TE of mRNAs with 5' UTR m®A. Ribosome profiling data from HeLa cells expressing METTL3 
or control sIRNAs (Wang et al., 201 5) were used to determine changes in TE for various classes of mRNAs defined by single-nucleotide-resolution m®A mapping. 
Compared to nonmethylated mRNAs (blue), transcripts with m®A residues in the coding sequence (CDS) or 3' UTR (green) exhibit only a marginal decrease in TE. 
However, mRNAs containing m®A within the 5' UTR (red) show a large reduction in TE. p values were calculated using the Mann-Whitney test. 

(B) TEs of various classes of m®A-containing mRNAs were analyzed using ribosome profiling datasets from HeLa cells as described in (A). Shown are the mean 
fold changes in TE (siMETTL3/siControl) for mRNAs with m®A residues only in the 5' UTR (red), within the 3' UTR (purple), within 50 nt of the stop codon (yellow), 
within the CDS and/or 3' UTR (green), or in all mRNAs (blue), as defined by single-nucleotide-resolution m®A mapping. mRNAs with 5' UTR m®A residues exhibit a 
dramatic reduction in TE after METTL3 depletion, whereas transcripts with m®As in other regions fail to show this effect. All mean fold change TE values were 
computed after background subtraction of the mean fold change computed from all nonmethylated control mRNAs, as indicated by the arrow (mean ± SEM; 
*p < 0.05). 

(C) Fto knockdown increases heat-shock-induced translation oWsp70. MEF cells stably expressing either Ffo shRNA or scramble shRNA were subjected to heat 
shock stress. Cell lysates were collected at various times post-heat shock (“Post HS”) and then used for western blot analysis with the indicated antibodies. Fto 
knockdown increased the levels of stress-induced Hsp70 protein compared to control shRNA (“S exp” = short exposure; “L exp” = long exposure). Levels of 
Hsp25, another heat shock-induced protein, were unaffected by Ffo knockdown. Right panel shows quantification of Hsp70 levels normalized to p-actin (n = 3; 
mean ± SEM; **p < 0.1). 

(D) MEFs stably expressing control or Fto shRNA were subjected to heat shock stress as in (C). Polysome fractions were separated using sucrose gradient 
fractionation (left panels) followed by RT-qPCR for Hsp70 (top right panel) and Gapdh (bottom right panel) in each fraction. Hsp70 levels are increased in 
polysome fractions following Fto knockdown, whereas the distribution of Gapdh is unchanged (n = 3; mean ± SEM; Hsp70: p = 0.0007, two-way ANOVA; Gapdh: 
p = 0.3722, two-way ANOVA considering the entire range of time points). 

(E) MEF cells were infected with either GFP or Fto lentivirus and subjected to heat shock stress. Cell lysates were collected at various times post-heat shock and 
then used for western blot analysis with the indicated antibodies. Fto overexpression decreased the levels of heat-shock-induced Hsp70 protein compared to 
GFP overexpression. Levels of Hsp25 were unaffected by Fto overexpression. Right panel shows quantification of Hsp70 levels normalized to p-actin (n = 3; 
Mean ± SEM; *p < 0.5). 

(legend continued on next page) 
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elF3a PAR-iCLIP 

elFSa PAR-iCLIP was performed using HEK293T cells as described previously 
(Huppertz et al., 2014) with some adjustments. 10 million cells were incubated 
with 1 00 mM 4SU for 8 hr. Media was then discarded, and cells were placed on 
ice and irradiated with 365 nm UV light using a Stratalinker UV crosslinker 
(Stratagene) with 150 mJ/cm^. Cells were scraped in ice-cold lx PBS and 
collected by centrifugation at 200 x g for 10 min at 4°C. Cell pellets were sus- 
pended in 200 i^l of 1% SDS, 10 mM DTT, and lx protease inhibitors 
(complete mini EDTA-free, Roche). The lysate was then passed through an 
18G needle 10 times to improve cell lysis and shearing of DNA. SDS was 
neutralized by diluting the lysate to 2 ml using RIPA buffer without SDS. The 
remainder of the protocol was performed as described (Huppertz et al., 
2014) using rabbit anti-elF3a (Abeam). 

Additional methods are detailed in the Supplemental Experimental Proce- 
dures section. 
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SUMMARY 

There is substantial heterogeneity among primary 
prostate cancers, evident in the spectrum of molecu- 
lar abnormalities and its variable clinical course. As 
part of The Cancer Genome Atlas (TCGA), we present 
a comprehensive molecular analysis of 333 primary 
prostate carcinomas. Our results revealed a molecu- 
lar taxonomy in which 74% of these tumors fell into 
one of seven subtypes defined by specific gene fu- 
sions (ERG, ETV1I4, and FLU) or mutations (SPOP, 
FOXA1, and IDH1). Epigenetic profiles showed sub- 
stantial heterogeneity, including an IDH1 mutant sub- 
set with a methylator phenotype. Androgen receptor 
(AR) activity varied widely and in a subtype-specific 
manner, with SPOP and FOXA1 mutant tumors hav- 
ing the highest levels of AR-induced transcripts. 
25% of the prostate cancers had a presumed action- 
able lesion in the PI3K or MARK signaling pathways, 
and DNA repair genes were inactivated in 19%. Our 
analysis reveals molecular heterogeneity among pri- 
mary prostate cancers, as well as potentially action- 
able molecular defects. 

INTRODUCTION 

Prostate cancer is the second most common cancer in men and 
the fourth most common tumor type worldwide (Ferlay et al., 
2013). It is estimated that, in 2015, prostate cancer will be diag- 
nosed in 220,800 men in the United States alone and that 27,540 
will die of their disease (Siegel et al., 201 5). Multiple genetic and 
demographic factors, including age, family history, genetic sus- 
ceptibility, and race, contribute to the high incidence of prostate 
cancer (Al Olama et al., 2014). 

In the current era of prostate-specific antigen (PSA) screening, 
nearly 90% of prostate cancers are clinically localized at the time 
of their diagnosis (Penney et al., 2013). The clinical behavior of 
localized prostate cancer is highly variable— while some men 
will have aggressive cancer leading to metastasis and death 
from the disease, many others will have indolent cancers that 
are cured with initial therapy or may be safely observed. Multiple 
risk stratification systems have been developed, combining the 
best currently available clinical and pathological parameters 
(such as Gleason score, PSA levels, and clinical and pathological 
staging); however, these tools still do not adequately predict 
outcome (Cooperberg et al., 2009; D’Amico et al., 1998; Kattan 
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et al., 1998). Further risk stratification using molecular features 
could potentially help distinguish indolent from aggressive pros- 
tate cancer. 

Molecular and genetic profiles are increasingly being used 
to subtype cancers of all types and to guide selection of more 
precisely targeted therapeutic interventions. Several recent 
studies have explored the molecular basis of primary prostate 
cancer and have identified multiple recurrent genomic alter- 
ations that include mutations, DNA copy-number changes, rear- 
rangements, and gene fusions (Baca et al., 2013; Barbieri et al., 
2012; Berger et al., 2011, Lapointe et al., 2007; Pfiueger 
et al., 2011; Taylor et al., 2010; Tomlins et al., 2007; Wang 
et al., 2011). The most common alterations in prostate can- 
cer genomes are fusions of androgen-regulated promoters 
with ERG and other members of the E26 transformation-specific 
(ETS) family of transcription factors. In particular, the TMPRSS2- 
ERG fusion is the most common molecular alteration in prostate 
cancer (Tomlins et al., 2005), being found in between 40% and 
50% of prostate tumor foci, translating to more than 100,000 
cases annually in the United States (Tomlins et al., 2009). Never- 
theless, among treated prostate cancers, and despite extensive 
study, affected individuals with fusion-bearing tumors do not 
appear to have a significantly different prognosis following pros- 
tatectomy than those without (Gopalan et al., 2009; Pettersson 
et al., 2012). Prostate cancers also have varying degrees of 
DNA copy-number alteration; indolent and low-Gleason tumors 
have few alterations, whereas more aggressive primary and met- 
astatic tumors have extensive burdens of copy-number alter- 
ation genome wide (Taylor et al., 2010; Hieronymus et al., 
2014; Lalonde et al., 2014). In contrast, somatic point mutations 
are less common in prostate cancer than in most other solid tu- 
mors. The most frequently mutated genes in primary prostate 
cancers are SPOP, TP53, FOXA1 , and PTEN (Barbieri et al., 
2012). Only recently has the spectrum of epigenetic changes in 
prostate cancer genomes been explored (Bdrno et al., 2012; 
Friedlander et al., 2012; Kim et al., 2011; Kobayashi et al., 
2011; Mahapatra et al.,2012). 

Importantly, no studies have comprehensively integrated 
diverse omics data types to assess the robustness of previously 
defined subtypes and potentially prognostic alterations. Here, to 
gain further insight into the molecular-genetic heterogeneity of 
primary prostate cancer and to establish a molecular taxonomy 
of the disease for future diagnostic, prognostic, and therapeutic 
stratification, the TCGA Network has comprehensively charac- 
terized 333 primary prostate cancers using seven genomic 
platforms. This analysis reveals novel molecular features that 
provide a better understanding of this disease and suggest 
potential therapeutic strategies. 
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Tablet. Cohort Characteristics 


Clinical Feature 


Age 


61 (43-76) 


Pre-operative PSA 


7.4 (1.6-87.0) 


Gleason Score 


3+3 


65 


3+4 


102 


4+3 


78 


> 8 


88 


Tumor Cellularity (pathology) 


<20% 


7 


21-40% 


40 


41-60% 


84 


61-80% 


115 


81-100% 


87 


Pathologic Stage 


pT2a/b 


18 


pT2c 


111 


pT3a 


110 


pT3b 


82 


pT4 


6 


PSA Recurrence 


Yes 


33 


No^ 


248 


Not available 


47 


Margin Status 


Positive 


69 


Negative 


193 


Not available 


71 


Ethnicity 


Caucasian 


270 


African descent 


43 


Asian 


8 


Not available 


12 


^Either no evidence of recurrence or insufficient follow-up. 



RESULTS 

Cohort and Platforms 

The cohort of primary prostate cancers analyzed resulted 
from extensive pathologic, analytical, and quality control review, 
yielding 333 tumors from 425 available cases. Images of frozen 
tissue were evaluated by multiple expert genitourinary patholo- 
gists, and cases were excluded if no tumor cells were identifiable 
in the sample or if there was evidence of significant RNA degra- 
dation (Figure S1 ; Supplemental Experimental Procedures). For 
the subset of cases reviewed by two pathologists, tumor cellu- 
larity estimates were within 20% of each other in 71 % of cases. 
In total, 78% of Gleason scores were concordant within one 
grade of the secondary pattern (Supplemental Experimental Pro- 
cedures). Moreover, due to the challenge of acquiring primary 



prostate cancer specimens of high tumor cellularity, we also 
performed a multi-platform analysis of tumor content, estimating 
tumor purity with analytical approaches utilizing both DNA 
(Carter et al., 2012; Prandi et al., 2014) and RNA (Quon et al., 
2013; Ahn et al., 2013) sequencing data. The molecular and 
pathologic estimates are presented in Table SI A and Figure SI . 
The clinical and pathological characteristics of the final cohort 
are presented in Table 1. The average follow-up time following 
radical prostatectomy was just under 2 years, which precluded 
outcomes analysis due to the long natural history of primary 
prostate cancer. 

We characterized isolated biomolecules from these 333 tumor 
samples using four platforms: whole-exome sequencing for 
somatic mutations, array-based methods for profiling both so- 
matic copy-number changes and DNA methylation, and mRNA 
sequencing. We also performed microRNA (miRNA) sequencing 
on 330 of these samples, reverse-phase protein array (RPPA) on 
152 samples, and low-pass and high-pass whole-genome 
sequencing (WGS) on 100 and 19 tumor/normal pairs, respec- 
tively (Supplemental Experimental Procedures). For 19 samples, 
non-malignant adjacent prostate samples were also examined 
for DNA methylation and RNA/miRNA expression analyses. 

The Molecular Taxonomy of Primary Prostate Cancer 

Previous studies indicate that many genomically distinct subsets 
of prostate cancer exist. These are driven in some cases by 
frequent events, such as androgen-regulated fusions of ERG 
and other ETS family members, or recurrent SPOP mutations 
and, in other cases, by less common genomic aberrations. Given 
the comprehensive nature of our data, we sought to unify these 
disparate findings to establish a molecular taxonomy of primary 
disease that integrates results from somatic mutations, gene 
fusions, somatic copy-number alterations (SCNA), gene expres- 
sion, and DNA methylation. We first performed unsupervised 
clustering of data from each molecular platform, as well as inte- 
grative clustering using iCIuster (Shen et al., 2009) (Figures S2, 
S3, S4, S5, S6, and S7). These analyses uncovered both known 
and novel associations, with 74% of all tumors being assignable 
to one of seven molecular classes based on distinct oncogenic 
drivers: fusions involving (1) ERG, (2) ETV1 , (3) ETV4, or (4) FLU 
(46%, 8%, 4%, and 1%, respectively); mutations in (5) SPOP 
or (6) FOXA1; or (7) IDH1 mutations (1 1 %, 3%, and 1 %, respec- 
tively) (Figures 1 and S2 and Table SI A). 

In total, 53% of tumors were found to have ETS family gene 
fusions (ERG, ETV1 , ETV4, and FLU) after analysis with two com- 
plementary algorithms (Sboner et al., 2010; Wang et al., 2010) 
(see the Experimental Procedures). While TMPRSS2 was the 
most frequent fusion partner in all ETS fusions, we identified 
fusions with other previously described androgen-regulated 5' 
partner genes, including SLC45A3 and NDRG1 (Table S1E). 
We also identified several tumors that overexpressed full-length 
ETS transcripts that were mutually exclusive with ETS fusions 
(12 ETV1 high tumors, 6 ETV4, and 2 FLU) (Table S1E). ETS 
overexpression in these cases could possibly be mediated via 
epigenetic mechanisms or cryptic translocations of the entire 
gene locus to a transcriptionally active neighborhood. In the 
one case with elevated ETV1 full-length expression studied by 
whole-genome sequencing, we identified a cryptic genomic 
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Figure 1. The Molecular Taxonomy of Primary Prostate Cancer 

Comprehensive molecular profiling of 333 primary prostate cancer samples revealed seven genomically distinct subtypes, defined (top to bottom) by ERG 
fusions (46%), ETV1/ETV4/FLI1 fusions or overexpression (8%, 4%, 1 %, respectively), or by SPOP (1 1 %), FOXA1 (3%), and IDH1 (1 %) mutations. A subset of 
these subtypes was correlated with clusters computationally derived from the individual characterization platforms (somatic copy-number alterations, 
methylation, mRNA, microRNA, and protein levels from reverse phase protein arrays). The heatmap shows DNA copy-number for all cases, with chromosomes 
shown from left to right. Regions of loss are indicated by shades of blue, and gains are indicated by shades of red. 

See also Figures S1 , S2, S3, S4, S5, S6, and S7 and Tables S1 A, S1 B, S1 E, and S2. 



rearrangement 3' of the ETV1 locus with a region on chromo- 
some 14 near the MIPOL1 gene adjacent to FOXA1 . This event 
is similar to previously described ETV1 translocations in LNCaP 
and MDA-PCa2b cell lines and in patient samples (Tomlins et al., 
2007; Gasi et al., 2011). Overall, while fusions in the four genes 
were mostly mutually exclusive, three tumors showed evidence 
for fusions involving more than one of these genes (Table SI E). 
Given that histologically defined single tumor foci have been 
shown to be rarely composed of different ETS fusion-positive 
clones (Cooper et al., 2015; Kunju et al., 2014; Pfiueger et al., 
2011), it is likely these cases reflect convergent phenotypic 
evolution in clonally heterogeneous tumors. Tumors defined by 
SPOP mutations were mutually exclusive with all ETS fusion- 
positive cases, though four of the SPOP mutant tumors also 
possessed FOXA1 mutations. In all four of these tumors, both 
the SPOP and FOXA1 mutations were clonal, indicating that 
they are present in the same tumor cells. 

Beyond the class-defining lesions, there were multiple pat- 
terns of both known and novel concurrent alterations in key 
prostate cancer genes. The former included the preponderance 
of PTEN deletions in ERG fusion-positive cases (Taylor et al., 
2010). Similarly, SPOP mutations have previously been found 
to occur in ~10% of clinically localized prostate cancers, were 
mutually exclusive of tumors defined by ETS rearrangements, 
and may designate a distinct molecular class of disease based 



primarily on distinctive SCNA profiles (including deletion of 
CHD1 , 6q, and 2q) (Barbieri et al., 2012; Blattner et al., 2014). 
Beyond reaffirming these known patterns, our taxonomy re- 
vealed new relationships and subtypes. Specifically, the SPOP 
mutant/C/-/D7-deleted subset of prostate cancers had notable 
molecular features, including elevated levels of DNA methyl- 
ation, homogeneous gene expression patterns, and frequent 
overexpression of SPINK1 mRNA, supporting SPOP mutation 
as a key feature in the molecular taxonomy of prostate can- 
cer. Interestingly, mRNA, copy-number, and methylation profiles 
were similar in tumors with FOXA1 mutations and those with 
SPOP mutations. Furthermore, we identified a new genomically 
distinct subtype of prostate cancer defined by hotspot mutations 
in IDH1, described in greater depth below. 

Despite this detailed molecular taxonomy of primary prostate 
cancers, 26% of all tumors studied appeared to be driven by still- 
occult molecular abnormalities or by one or more frequent alter- 
ations that co-occur with the genomically defined classes. Some 
of these tumors showed a high burden of copy-number alter- 
ations or DNA hypermethylation. Enrichment analysis indicated 
that this subset of tumors was enriched for mutations in TP53, 
KDM6A, and KMT2D] deletions of chromosomes 6 and 16; 
and amplifications of chromosomes 8 (spanning MYC) and 1 1 
(CCND1) (Table S2). To characterize this group further, we per- 
formed whole-genome sequencing of 19 tumor specimens and 
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Figure 2. Recurrent Alterations in Primary Prostate Cancer 

The spectrum and type of recurrent alterations and genes (mutations, fusions, deletions, and overexpression) in the cohort are shown (left to right) grouped by the 
molecular subtypes defined in Figure 1 . On the right, the statistical significance of individual mutant genes (MutSig q value) is shown. Mutations in IDH1 , PIK3CA, 
RB1, KMT2D, CHD1 , BRCA2, and CDK12 are also shown, despite their not being statistically significant. SPINK1 overexpression is shown for reference. 

See also Tables S1 B, S1 C, S1 D, and S1 E. 



their matched normal tissues, a subset of which had high tumor 
cellularity but still lacked DNA copy-number alterations or any 
known or presumed driver lesions. Interestingly, no occult driver 
abnormalities or highly recurrent regulatory mutations were iden- 
tified, such as the TERT promoter mutation common to many 
other tumor types (Khurana et al., 2013). Therefore, a significant 
(up to 26%) subset of primary prostate cancers of both good and 
poor clinical prognosis (including those with Gleason scores 
of >8) is driven by as-yet-unexplained molecular alterations. 

mRNA clusters were tightly correlated with ETS fusion status, 
where mRNA cluster 1 consisted primarily of ETS-negative tu- 
mors and mRNA clusters 2 and 3 were split among ETS fusion- 
positive tumors (Figures 1 and S4). miRNA clustering showed a 
similar pattern, revealing a general difference in miRNA expres- 
sion between ETS-positive and -negative tumors (Figures 1 and 
S6). Clustering of RPPA data identified three distinct subgroups, 
with clusters exhibiting elevated PI3K/AKT, MAP-kinase, and re- 
ceptor tyrosine kinase activity (Figure S7A). The cluster was not 
enriched, however, in genomic alterations in these pathways, 
and in general, there was little correlation of increased pathway 
activity (as measured by phospho-AKT and other downstream 
phospho-proteins) with the frequent genomic alterations in the 
pathways (see the example of PTEN deletions in Figure S7B). 

Recurrently Altered Genes and Their Patterns across 
Subtypes 

The overall mutational burden of the cohort, inferred from 
whole-exome sequencing, was 0.94 mutations per megabase 
(median, range 0.04-28 per megabase), which corresponds to 
19 non-synonymous mutations per tumor genome (median; 
13-25, 25^^ and 75^*^, percentiles respectively). This is consistent 



with prior exome and genome-scale sequencing results for local- 
ized prostate cancers (Barbieri et al., 201 2; Baca et al., 201 3) and 
is lower than the mutational burden of metastatic prostate 
cancers (Gundem et al., 2015; Grasso et al., 2012; Robinson 
et al., 2015). These results reaffirm that prostate cancer pos- 
sesses a lower mutational burden than many other epithelial 
tumor types that are not associated with a strong exogenous 
mutagen (Alexandrov et al., 2013; Lawrence et al., 2013). Prior 
exome sequencing of 112 prostate cancers identified 12 recur- 
rently mutated genes through focused assessment of point mu- 
tations and short insertions and deletions (Barbieri et al., 2012). 
By comparison, mutational significance analysis of these 333 
tumor-normal pairs by MutSigCV (Lawrence et al., 2013, 2014) 
identified 13 significantly mutated genes (q value < 0.1), seven 
of which had not been previously identified (Figure 2 and Tables 
SIB and SIC). Among the significantly mutated genes, SPOP, 
TP53, FOXA1, PTEN, MED12, and CDKN1B were previously 
identified as recurrently mutated. Additional clinically relevant 
genes were identified with lower mutation frequencies; these 
included genes within canonical kinase signaling pathways 
(BRAF, HRAS, AKT1), the beta-catenin pathway (CTNNB1), 
and the DNA repair pathway (ATM). The rate of BRAF mutations 
(2.4%) seen in this study is higher than previously reported; these 
include several known activating mutations but, curiously, not 
the canonical V600E hotspot. We identified no BRAF fusions, 
which had previously been reported in a subset of clinically 
advanced prostate cancer (Palanisamy et al., 2010). NKX3-1 , 
previously implicated in familial prostate cancer syndromes 
and often found to be deleted, was also somatically mutated in 
this cohort (1 % of tumors). While its functional significance is un- 
known, ZMYM3, an epigenetic regulatory protein not previously 
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implicated in prostate cancer but infrequently mutated in Ewing 
sarcomas (Tirode et al., 2014) and various pediatric cancers 
(Huether et al., 2014), was also recurrently mutated (2% of tu- 
mors). Genes with known biological relevance that were mutated 
at frequencies just below the threshold of significance (q value < 
0.01) included KMT2C (MLL3), KMT2D (MLL4), APC, IDH1 , and 
PIK3CA (Figure 2 and Tables SIB and SIC). Mutations in the 
tumor suppressor genes KMT2C, KMT2D, and APC were mostly 
truncating; the IDH1 and PIK3CA mutations occurred in previ- 
ously characterized hotspots and thus may have therapeutic 
relevance for those occasional tumors with these mutations. 

Notwithstanding these key somatic mutations, the most 
frequent molecular abnormalities involved chromosomal arm- 
level copy-number alterations (Taylor et al., 2010). These alter- 
ations included recurrent genomic gains of chromosome 7 
and 8q and heterozygous losses of 8p, 13q, 16q, and 18 (Fig- 
ure S3A). Significance analysis of recurrent focal DNA copy- 
number alterations revealed 20 amplifications and 35 deletions 
(q value < 0.25, GISTIC 2.0; Figure S3A and Table SI D). Recur- 
rent focal amplifications included those spanning known onco- 
genes such as CCND1 (11q13.2, 2%), MYC (8q24.21, 8%), 
and FGFR1 and WHSC1L1 (8p1 1 .23, 8%). Recurrent focal dele- 
tions were much more common. Homozygous deletions span- 
ning the PTEN locus occurred at one of the highest rates of 
any tumor type studied thus far (15%). Focal deletions of the 
region between the TMPRSS2 and ERG genes on 21q22.3, 
which result in TMPRSS2-ERG fusions, were unique to prostate 
cancers, as expected. Other focal deletions include those span- 
ning tumor suppressors TP53 (17p13.1), CDKN1B (12p13.1), 
and MAP3K1 (5q11.2), FANCD2 (3p26), as well as SPOPL 
(2q22.1) and the complex locus spanning FOXP1 /RYBP/SHQ1 
(3p13). MAP3K7 (6q. 12-22) was also frequently deleted, along 
with deletion of CHD1 (5q15-q21); co-deletion of these loci 
has been associated with aggressive ETS-negative prostate 
cancer (Kluth et al., 2013; Rodrigues et al., 2015). 

As the pattern and extent of SCNAs in prostate cancer ge- 
nomes have been associated with probability of disease recur- 
rence and metastasis in primary prostate cancers (Taylor et al., 
2010; Hieronymus et al., 2014; van Dekken et al., 2004; Paris 
et al., 2004), we sought to identify similar structure in the burden 
of SCNAs by performing hierarchical clustering of arm-level 
alterations. We identified three major groups of prostate can- 
cers, one with mostly unaltered genomes (hereafter referred 
to as quiet), a second group encompassing 50% of all tumors 
with an intermediate level of SCNAs, and a third group with a 
high burden of arm level genomic gains and losses (Figures 
S3B and S3C). While a formal outcome analysis was not possible 
due to the limited clinical follow-up available for this cohort, the 
subset of tumors with the greatest burden of SCNAs had signif- 
icantly higher Gleason scores and PSA levels than the other two 
groups (Figures S3B-S3D). The tumors in this group also had 
significantly higher tumor cellularity (Figure S3C). 

Epigenetic Changes Define Molecularly Distinct 
Subtypes of Prostate Cancer 

Integrative analysis of genetic and epigenetic changes revealed 
a diversity of DNA methylation changes that defined molecularly 
distinct subsets of primary prostate cancer (Figure 3). Unsuper- 



vised hierarchical clustering of the most variably hypermethy- 
lated CpGs identified four epigenetically distinct groups of 
prostate cancers (Figures S5A and S5B). When integrated with 
the molecular taxonomy defined above, we found a number of 
striking associations. Among these was a notable pattern within 
ERG fusion-positive tumors. Specifically, while nearly two-thirds 
of all ERG fusion-positive tumors belonged to an unsupervised 
cluster with only moderately elevated DNA methylation (DNA 
methylation cluster 3), the remaining ERG fusion-positive tumors 
comprised a distinct hypermethylated cluster (cluster 1 ) that was 
almost exclusively associated with ERG fusions. On average, 
this cluster contained twice the number of hypermethylated 
loci as DNA methylation cluster 3 (Figure S5A), and the epige- 
netic patterns were largely distinct from those of ETV1 and 
ETV4 fusion-positive tumors, which showed more heteroge- 
neous methylation. What drives these epigenetically distinct 
groups of ETS fusion-positive tumors is unknown, but there is 
considerable diversity in their DNA methylation profiles that 
may reflect altered epigenetic silencing (Figures S5A and S5B). 
Together, these results support further ETS fusion-based sub- 
typing of disease but also reveal a greater molecular and likely 
biological diversity among ERG fusion-positive tumors than pre- 
viously appreciated. Likewise, these results are consistent with 
in vivo mouse modeling and expression profiling studies that 
suggest important molecular and clinicopathological differences 
between ERG and non-EF?G ETS fusion-positive tumors (Baena 
et al., 2013; Tomlins et al., 2015). 

SPOP and FOXA1 mutant tumors exhibited homogeneous 
epigenetic profiles. These tumors belonged almost exclusively 
to DNA methylation cluster 2, a group that also contained a 
majority of the ETV1 and ETV4 but not EF?G-positive tumors. 
Lastly, the IDH1 mutant tumors were notable given their strongly 
elevated levels of genome-wide DNA hypermethylation (Fig- 
ure S5B). While of low incidence, these IDH1 R132 mutant 
tumors defined a distinct subgroup of what appears to be 
early-onset prostate cancer (Figure 3B) that possesses fewer 
DNA copy-number alterations (see Figure 1) or other canonical 
genomic lesions that are common to most other prostate can- 
cers. IDH1 and IDH2 mutations have been associated with a 
DNA methylation phenotype in other tumor types, most notably 
in gliomas (Noushmehr et al., 201 0) and acute myeloid leukemias 
(AML, Figueroa et al., 2010). Curiously, IDH1 mutant prostate 
cancers possessed even greater levels of genome-wide hyper- 
methylation than either glioma or AML IDH1 mutant tumors (Fig- 
ure 3B). After further investigating DNA methylation differences 
between IDH mutant and wild-type tumors among prostate 
cancers, gliomas, and AMLs, we found that hypermethylated 
loci were specific to the cancer type rather than IDH mutants 
(Figure S5F). 

Integrating these epigenetic data with mRNA expression 
levels, we identified 164 genes that were epigenetically silenced 
in subsets of the cohort (Figure S5C and Table S1F). These 
silenced genes were significantly enriched for genes previously 
found to be differentially expressed in prostate cancer— specif- 
ically, genes that are downregulated in metastatic prostate can- 
cer (Chandran et al., 2007) and genes involved in prostate organ 
development (Schaeffer etal., 2008) (q value <2.0 x 10“^). These 
164 silenced genes displayed heterogeneous frequencies of 
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Figure 3. Hypermethylation Is Common across Primary Prostate Cancer 

(A) Primary prostate cancers show diverse methylation changes compared to normal prostate samples (left). Unsupervised clustering was performed on the beta- 
values of the 5,000 most hypermethylated loci, and the results mapped to the genomic subtypes. Ef?G-positive tumors had a high diversity of methylation 
changes, with a distinct subgroup (cluster 1) nearly unique to this group. SPOP and FOXA1 mutant tumors also exhibited global hypermethylation. 

(B) IDH1 mutant prostate cancers, which are associated with younger age, are among the most hypermethylated tumors, as in glioblastoma (GBM) and AML. 
See also Figure S4 and Table S1 F. 



epigenetic silencing across the cohort. For example, SHF, 
FAXDC2, GSTP1, ZNF154, and KLF8 were epigenetically 
silenced in almost all tumors (>85%) whereas STAT6 was 
silenced predominantly in ETS fusion-positive tumors and not 
in SPOP and IDH1 mutant tumors. Conversely, HEXA was 
silenced preferentially in SPOP mutant tumors compared to 
ERG fusion-positive tumors (86.5 versus 14.5%, respectively, 
p < 5.4 X 10“^^). Consistent with their increased DNA hyperme- 



thylation, the IDH1 mutant prostate tumors also possessed the 
greatest number of epigenetically silenced genes among all pros- 
tate tumors (Table SI F). 

AR Activity Is Variable in Primary Prostate Cancers 

The androgen receptor (AR) regulates normal prostate develop- 
ment, as well as critical growth and survival programs in prostate 
carcinoma. Primary prostate cancer is androgen dependent, and 
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Figure 4. The Diversity of Androgen Receptor Activity in Primary Prostate Cancer 

(A) Androgen receptor activity, as inferred by the induction oiAR target genes, was significantly increased in SPOP and FOXA 1 mutant tumors when compared to 
normal prostate or Ef?G-positive tumors. This increase in activity cannot be fully explained by AR mRNA or protein levels. 

(B) Multiple known Af? splice variants were detected in benign prostate (left) and primary prostate cancer (right), with the AR-V7 variant detected in 50% of tumors. 

(C) Real-time qPCR comparison of AR-V7 in 74 tumor samples (gray) and 5 adjacent-normal samples (blue). 

(D and E) (D) FOXA1 missense mutations were clustered in the forkhead domain, mostly in residues that do not form contacts with DNA (see also the 3D structure 
in panel E). 



androgen activity is a central axis in prostate cancer pathogen- 
esis, driving the creation and overexpression of most ETS fusion 
genes (Lin et al., 2009; Mani et al., 2009; Tomlins et al., 2005). 
However, the extent to which individual primary prostate can- 
cers differ in androgen sensitivity or dependence is unknown, 
and the issue has translational implications because AR target- 
ing is therapeutically important. To address these questions, 
we sought to infer the AR output of tumors by calculating an 
AR activity score from the expression pattern of 20 genes that 
are experimentally validated AR transcriptional targets (Hierony- 
mus et al., 2006). This score suggested that a broad spectrum of 
AR activity exists across all prostate tumors, as well as between 
genomic subtypes (Figure 4A). Although ETS fusion genes are 
under AR control, the ETS fusion-positive groups had variable 
AR transcriptional activity. In contrast, we found that tumors 
with SPOP or FOXA1 mutations had the highest AR transcrip- 
tional activity of all genotypically distinct subsets of prostate 
cancer (p = 1 .1 x 1 and 0.04, respectively, t test). Consistent 
with this, SPOP mutations have been previously implicated in 
androgen signaling in model systems, since both AR and AR 



coactivators are substrates deregulated by SPOP mutation 
(Geng et al., 2013; An et al., 2014; Geng et al., 2014), providing 
a possible explanation for the associated increase in AR activity 
seen in this subtype of prostate cancers. 

While AR transcriptional output is a proxy for ligand-driven 
AR activity in many tumors, AR transcript variants have been 
described that encode truncated AR proteins that lack the 
ligand-binding domain and hence are capable of activating AR 
target genes in the absence of androgens (Dehm et al., 2008; 
Watson et al., 201 0). Using RNA sequencing reads that spanned 
the splice junctions unique to each AR variant, we quantified 
the expression of these AR transcript variants. This analysis 
revealed that several AR splice variants, most notably AR-V7, 
can be detected at low levels in primary tumors and, in a few 
cases, in adjacent benign prostate tissue (Figure 4B), and 
we validated these expression levels with qPCR (Figure 4C). 
However, their expression was not associated with differential 
expression of known AR target genes or with the seven previ- 
ously defined genomic subtypes. Most detected splice forms 
were truncated after the DNA-binding domain by the presence 
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Figure 5. Alterations in Clinically Relevant 
Pathways 

(A) Alterations in DNA repair genes were common in 
primary prostate cancer, affecting almost 20% of 
samples through mutations or deletions in BRCA2, 
BRCA1, CDK12, ATM, FANCD2, or RAD51C. 

(B) Focal deletions of FAA/CD2 were found in 7% of 
samples and were associated with reduced mRNA 
expression of FANCD2. 

(C) The RAS or PI -3-Kinase pathways were altered 
in about a quarter of tumors, mostly through dele- 
tion or mutation of PTEN, but also through rare 
mutations in other pathway members. 

(D) AKT1 mutations were found in three samples. 
Two of them were the known activating E17K, and 
the third one affected the D323 residue, which is 
adjacent to E17 in the protein structure. 

(E) One of the observed PIK3CB mutations, E552K, 
is paralogous to the known activating E545K mu- 
tation in PIK3CA, and the RAC1 Q61 and RRAS2 
Q72 mutations are paralogous to the Q61 muta- 
tions in KRAS. 

(F) BRAF mutations were found in 2% of samples, 
mostly in known non-V600E hotspots in the kinase 
domain. 

See also Figure S3. 
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of a cryptic exon rather than by skipping those exons encoding 
the ligand-binding domain. Truncated AR splice variants were 
previously assumed to be expressed primarily in metastatic 
castration-resistant prostate cancers, where, at least for AR- 
V7, their presence was associated with resistance to hormone 
therapy (Antonarakis et al., 2014). Hence, our finding that they 
are expressed in hormone-naive primary prostate cancers is 
notable. 

In prostate cancers, the degree of AR pathway output 
is controlled not only by AR mRNA and protein expression 
levels, but also by expression of and mutations in AR cofactors 
(Heemers and Tindall, 2007). It is therefore notable that FOXA1 
was recurrently mutated in our cohort, as it is a pioneering tran- 
scription factor that targets AR and has a demonstrated role in 
prostate cancer oncogenesis (Jin et al., 2013). We identified 
FOXA1 mutations in 4% of the primary prostate cancers studied 
here, which is similar to the mutation frequency observed previ- 
ously (Barbieri et al., 201 2; Grasso et al., 201 2) (Figure 4A). While 
a subset of these mutations was present in tumors that also 
possessed SPOP mutations and had elevated levels of AR 
output, FOXA1 mutations were mutually exclusive with all other 



alterations that define the genomic sub- 
classes described here. While there 
were some truncating mutations near 
the C terminus and the C-terminal part of 
the forkhead domain, the majority of the 
mutations found here and in other pros- 
tate cancer cohorts were missense muta- 
tions that primarily affect the winged- 
helix DNA binding domain of FOXA1. 
Curiously, these mutations do not directly 
alter FOXA1 DNA-binding residues (Fig- 
ures 4D and 4E), a pattern similar to the 
FOXA1 mutations recently found in lobular breast cancers 
(TCGA, unpublished data), which suggests that the impact of 
FOXA1 mutations has less to do with altering DNA binding 
than with disrupting or altering interactions with other chro- 
matin-bound cofactors. 

Clinically Actionable DNA Repair Defects in Primary 
Prostate Cancers 

Prior data indicate that several DNA repair pathways are dis- 
rupted in a subset of prostate cancers (Karanika et al., 2014; 
Pritchard et al., 2014). Moreover, the PARP inhibitor olaparib is 
effective in some patients with prostate cancer (Mateo et al., 
2014). Here, we found inactivation of several DNA repair genes 
that collectively affected 19% of affected individuals (Figure 5A). 
While we found only one inactivating BRCA1 germline muta- 
tion, a frameshift at V923 caused by a 4 bp deletion (Clinvar 
RCV000083190.3), BRCA2 inactivation affected 3% of tumors, 
including both germline and somatic truncating mutations. All 
six BRCA2 germline mutations were K3326*, a C-terminal trun- 
cating mutation with debated functional impact but increased 
prevalence in several tumor types (Farrugia et al., 2008; Martin 
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et al., 2005; Delahaye-Sourdeix et al., 2015). Two additional 
tumors possessed focal BRCA2 homozygous deletions that 
were accompanied by very low BRCA2 transcript expression. 
Four tumors (1 %) possessed either loss-of-function mutations 
or homozygous deletion of CDK12, a gene that has been impli- 
cated in DNA repair by regulating expression levels of several 
DNA damage response genes (Blazek et al., 2011) and is recur- 
rently mutated in metastatic prostate cancer (Grasso et al., 
2012). ATM, an apical kinase of the DNA damage response, 
which is activated by the Mrel 1 complex and mediates down- 
stream checkpoint signaling, was affected by a nonsense muta- 
tion in one case and by a likely kinase-dead hotspot N2875 
mutation in two cases. FANCD2 was similarly affected by diverse 
uncommon lesions, including a truncating mutation in one tumor, 
homozygous deletion in two tumors, and focal heterozygous los- 
ses in 6% of the cohort (Figure 5B). RAD51C (3%) was affected 
by focal DNA losses, most of which were heterozygous. Finally, 
it was notable that heterozygous losses of BRCA2 (13q13.1) 
almost always coincided with concurrent loss of the distant 
RB1 tumor suppressor gene (13q14.2) (Figure S3D). The obser- 
vation that nearly 20% of primary prostate cancers bear genomic 
defects involving DNA repair pathways is remarkably consistent 
with the recently announced TOPARP-A Phase II trial results in 
patients with metastatic castration-resistant prostate cancer, 
indicating that clinical responses to the PARP inhibitor olaparib 
likely occurred in the subgroup of tumors bearing defects in 
DNA repair genes (Mateo et al., 2014; Robinson et al., 2015). 

Clinically Actionable Lesions in PI3K and Ras Signaling 

The long tail of the frequency distribution of molecular abnor- 
malities is particularly notable among primary prostate cancers. 
Beyond PTEN, which was deleted or mutated in 17% of the 
cohort, various driver mutations in effectors of PI3K signaling 
were present at low incidence (Figure 5C). PIK3CA, which en- 
codes the 1 1 0 kDa catalytic subunit of phosphatidylinositol 3-ki- 
nase, was mutated in six tumors, including one case possessing 
coincident activating mutations (E542A and N345I), both of which 
appeared to be subclonal. The other four P/K3CA mutations were 
all known activating mutational hotspots (E545K, Q546K, N345I, 
and C420R), while one had a mutation of unknown function 
(E474A). Focal PIK3CA amplification with associated mRNA 
overexpression occurred in ~1 % of cases. Interestingly, PIK3CB 
was mutated in two tumors that also possessed coincident ho- 
mozygous deletions oi PTEN, both of which were clonal. PIK3CB 
E552K was found in one tumor at a paralogous residue to the 
canonical PIK3CA helical domain E545K mutant and is presum- 
ably activating (Figure 5E). As PTEAZ-deleted tumors are likely 
P/K3CB-dependent due to the feedback inhibition of PIK3CA, 
co-existent loss and mutation of PTEN and PIK3CB may be 
elevating PI3K pathway output and perhaps indicating a set of tu- 
mors in which combined PI3K and androgen signaling inhibition 
may be effective (Schwartz et al., 2015). Among other lesions 
that drive PI3K signaling, AKT1 was mutated in three tumors. 
Two tumors had the known El 7K hotspot mutation, while another 
encoded a D323Y mutation. Whereas El 7K is the most common 
hotspot in AKT1 across human cancer, the D323Y variant is un- 
common, having been identified previously in one lung adenocar- 
cinoma (Cancer Genome Atlas Research Network, 2014) and 



one urothelial bladder cancer (Guo et al., 2013). Nevertheless, 
while distant linearly from the activating El 7K hotspot, in three di- 
mensions, this D323Y kinase domain mutant directly abuts the 
PH-domain containing El 7K (Figure 5D) and has been described 
as potentially activating (Parikh et al., 2012). 

We also identified known or presumed driver mutations in 
several other genes of the MAPK pathway, affecting 25% of 
the tumors (Figure 5A). HRAS was mutated in four tumors, of 
which three were Q61 R hotspot mutations. Two mutations arose 
in other Ras family small GTPases. While both RAC1 Q61 R and 
RRAS2 Q72L occurred only once each, they affected residues 
paralogous to the RAS Q61 hotspot (Figure 5E) (Chang et al., 
2015). We also identified eight BRAE mutations, though, curi- 
ously, none were the common V600E mutation that is prevalent 
in cutaneous melanomas, thyroid cancers, and many other tu- 
mor types. Five BRAF mutations are likely activating, including 
known hotspots (K601 E, G469A, L597R), two of which confer 
sensitivity to MEK inhibitors (Dahiman et al., 2012; Bowyer 
et al., 2014). Another mutation was a likely activating in-frame 
3 amino acid deletion at K601 (Figure 5F), while the final mutation 
(F468C) affected the adjacent residue to the known G469 hot- 
spot. Together, these findings reveal a long tail of low-incidence 
potentially actionable predicted driver mutations present across 
the molecular taxonomy of prostate cancer. 

Comparison with Metastatic Prostate Cancer 

To put these results in context, we compared our findings with 
those from a recently published cohort of 1 50 castration-resistant 
metastatic prostate cancer samples (Robinson et al., 2015). The 
analysis revealed some similarities and many differences be- 
tween primary and treated metastatic disease. Although the 
overall burden of copy-number alterations and mutations was 
significantly higher in the metastatic samples (Figure 6A), consis- 
tent with previous findings (Taylor et al., 2010; Grasso et al., 
2012), the primary and metastatic samples were remarkably 
similar in their subtype distribution, with the exception that the 
metastatic dataset contained no IDH1 mutant tumors (Figure 6B). 
We compared the frequencies of all recurrently altered genes 
described in both studies and found that, similar to the overall 
burden of genomic alterations (Figure 6A), many genes and path- 
ways have increased alteration rates in the metastatic samples 
(Figure 6C and Table S3). Androgen receptor signaling was 
more frequently altered in the metastatic samples, most often 
by amplification or mutation of AR, events that were essentially 
absent in primary samples. Interestingly, SPOP mutations were 
somewhat less frequent in the metastatic samples (8% versus 
1 1 % in the primary samples). DNA repair and PI3K pathway alter- 
ations were more frequent in the metastatic samples, as were 
mutations or deletions of TP53, RB1 ,KMT2C, and KMT2D. Inter- 
estingly, we found no focal, clonal MYCL amplifications, which 
were recently described in primary prostate cancer (Boutros 
et al., 201 5), in either dataset nor in a separate set of 63 untreated 
prostate cancer samples (Flovelson et al., 2015). 

DISCUSSION 

The comprehensive molecular analyses of primary prostate can- 
cers presented here reveal highly diverse genomic, epigenomic. 
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and transcriptomic patterns. Major subtypes could be defined 
by fusions of the ETS family genes ERG, ETV, ETV4, or FLU 
and by mutations in SPOP, FOXA1 , or IDH1. However, even 
within the groups, there was significant diversity in DMA copy- 
number alterations, gene expression, and DMA methylation. 
The mutational heterogeneity mirrors the heterogeneous natural 
history of primary prostate cancers. 

Although the broad spectrum of copy-number alterations in 
tumors with ETS fusions has been previously characterized 
(Demichelis et al., 2009; Taylor et al., 2010), here we uncovered 
additional differences between the epigenetic profiles of those 
tumors. We found that ERG fusion-positive tumors can be sub- 
divided into two methylation subtypes: one with lower levels of 
methylation, and one with a distinct spectrum of hypermethyla- 
tion. Many genes were epigenetically silenced as a result of 
the hypermethylation in the latter tumors. While further studies 
will be required to determine which silencing events are linked 
to prostate cancer pathogenesis, the findings presented here 
reveal variability among what was previously considered to be 
genetically homogeneous prostate cancer subtypes. 

We have also identified a distinct subgroup of tumors with 
IDH1 R132 mutations that is associated with younger age at 
diagnosis. Although IDH1 mutations have previously been iden- 
tified in prostate cancer with a similar incidence (2.7%) (Ghiam 
et al., 2012; Kang et al., 2009), we show here that those tumors 



Figure 6. Comparison of Primary with Meta- 
static Prostate Cancer 

(A) Metastatic prostate cancer samples have more 
copy-number alterations (top, measured as frac- 
tion of genome altered) and mutations (bottom). 

(B) The relative distribution of main subtypes {ERG, 
ETV1/4, FLU, SPOP, FOXA1, IDH1 , other) is similar 
in primary and metastatic samples. 

(C) The alteration frequencies of several genes and 
pathways are higher in metastatic samples. The 
upper bar for each gene indicates the alteration 
frequency in primary samples, the lower bar for 
metastatic samples. The most notable differences 
in alteration frequencies involve the Androgen 
Receptor pathway, the PI3K pathway, and TP53. 
See also Table S3. 



are all ETS fusion negative and SPOP 
wild-type, have little SCNA burden, and 
possess elevated levels of genome-wide 
methylation. The levels of methylation 
observed in this methylator phenotype 
are higher than those observed in IDH1 
mutant GBMs and AMLs. Consistent 
with our observations, a recently pub- 
lished clinical study of 117 prostate can- 
cers identified a single IDH1 mutant 
prostate cancer from 56-year-old affected 
individual that also lacked significant copy 
number alterations, ETS gene fusions, or 
driver mutations (Hovelson et al., 2015). 
Future studies in cohorts with sufficient 
clinical follow-up will be able to ask 
whether the IDH1 mutant prostate cancers are prognostically 
distinct, as noted for gliomas (Noushmehr et al., 2010) and 
AMLs (Mardis et al., 2009), and if they are sensitive to newly 
developed IDH1 -targeted therapeutics (Rohle et al., 2013). 

Interestingly, 26% of the tumors in this study could not be 
characterized by one of the taxonomy-defining cardinal genomic 
alterations. The 26% were clinically and genomically heteroge- 
neous, with some tumors exhibiting extensive DMA copy-num- 
ber alterations and high Gleason scores indicative of poorer 
prognosis. About a third of them were genomically similar to 
SPOP and FOXA1 mutant tumors but lacked any canonical mu- 
tation (iCIuster 1 , methylation cluster 2, mRNA cluster 1); others 
were enriched for mutations of TP53, KDM6A, and KMT2D or 
specific SCNAs spanning MYC and CCND1 . Many of the tumors 
had low Gleason score with few if any DMA copy-number alter- 
ations and a normal-like DMA methylation pattern. As previously 
reported, tumors with fewer genomic alterations were also more 
commonly Gleason score 6 tumors (38% in the “quiet” class 
versus 8% in the class with the greatest burden of alterations). 

Tumor cellularity, as assessed by pathology review, was lower 
among tumors with fewer SCNAs (one-sided Mann-Whitney 
test, p = 0.0002), indicating that the apparent lower burden of al- 
terations in tumors with smaller volumes may be due in part to 
their tumor purities being lower. However, the lower cellularity 
of these tumors did not limit the detection of clonal molecular 
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lesions since tumor cellularity between ETS fusion-positive and 
these fusion-negative tumors was not significantly different 
(two-sided Mann-Whitney test, p = 0.32). One must also keep 
in mind that this study was limited to a single tumor focus for 
each affected individual, even though the vast majority of 
primary prostate tumors are multifocal and molecular heteroge- 
neity between different foci has been demonstrated (Cooper 
et al., 2015; Boutros et al., 2015; Lindberg et al., 2015). Such 
issues must be considered when designing new therapeutic 
approaches and biomarker panels for clinical use, as affected in- 
dividuals likely have more than one of these molecular subtypes 
present due to this commonly occurring tumor multifocality and 
molecular heterogeneity. 

Primary prostate cancers exhibit a wide range of androgen 
receptor activity. This study demonstrates for the first time a 
direct association between mutations in SPOP or FOXA1 and 
increased AR-driven transcription in human prostate cancers. 
Further studies in preclinical models, as well as in clinical trial 
settings, will be required to understand the implications of vari- 
able AR activity in the contexts of chemoprevention and prostate 
cancer-directed treatment strategies (Mostaghel et al., 2010). 
Other, more immediately actionable opportunities for targeted 
therapy exist for the 19% of primary prostate cancers that 
have defects in DNA repair and for the nearly equal number of 
cancers with altered key effectors of both PI3K and MAPK path- 
ways. While the numbers of DNA repair defects found in organ- 
confined prostate tumors may be lower than those found in 
metastatic prostate cancer (Robinson et al., 2015), an increase 
in the number of such defects with disease progression suggests 
a possible advantage to targeting DNA repair-deficient tumors at 
an earlier stage of disease, perhaps at initial diagnosis. Such 
strategies may include preventing DNA damage, as well as tar- 
geting deficient DNA repair (Ferguson et al., 2015). Alterations 
in the PI3K/MTOR pathway also play an important role: beyond 
the frequent inactivation of PTEN, we document rare activation 
of PIK3CA, PIK3CB, AKT1 , and MTOR, and of several small 
GTPases, including HRAS, as well as BRAF. As DNA sequencing 
of tumor samples becomes more widely adopted earlier in the 
clinical care of cancer patients, such alterations may emerge 
as candidates for inclusion in clinical trials after front-line 
therapy. 

In summary, our integrative assessment of 333 primary pros- 
tate cancers has confirmed previously defined molecular sub- 
types across multiple genomic platforms and identified novel 
alterations and subtype diversity. It provides a resource for 
continued investigation into the molecular and biological hetero- 
geneity of the most common cancer in American men. 

EXPERIMENTAL PROCEDURES 

Tumor and matched normal specimens were obtained from prostate cancer 
patients who provided informed consent and were approved for coiiection 
and distribution by iocai Institutionai Review boards. Biocks frozen in OCT 
were made of aii tumors and of paired benign tissue when present. A 5 micron 
section was cut from both the top and bottom of the OCT biock of 111 tumor 
cases and from the top or bottom oniy of the OCT biock of 222 tumor cases. 
Out of 39 normai sampies inciuded in the freeze, 23 underwent pathoiogy re- 
view, and prostate origin (i.e., no seminai vesicies) and absence of tumor and 
high grade prostate intraepitheiiai neopiasia (HGPiN) were confirmed. Tissue 



images were reviewed by eight genitourinary pathoiogists, who reported 
the primary and secondary Gieason patterns of cancer for each siide and 
estimates of tumor ceiiuiarity in 10% increments (from 0%-100%). In case 
of discrepancies of Gleason scores between the top and bottom sections, 
the Gleason scores of cancer in the section with the largest area of tumor 
were used. A subset of 54 cases was reviewed by two pathologists. Discrep- 
ancies that occurred between the two pathologists were reconciled by blind 
review by a third pathologist. 

DNA, RNA, and protein were purified and distributed throughout the TOGA 
network. Samples with evidence for RNA degradation were excluded from the 
study (Supplemental Experimental Procedures). In total, 333 primary tumors 
with associated clinicopathologic data were assayed on at least four molecular 
profiling platforms. Platforms included exome and whole genome DNA 
sequencing, RNA sequencing, mlRNA sequencing, SNP arrays, DNA methyl- 
ation arrays, and reverse phase protein arrays. Integrated multiplatform ana- 
lyses were performed. 

The data and analysis results can be explored through the Broad Insti- 
tute FireBrowse portal (http://firebrowse.org/?cohort=PRAD), the cBioPortal 
for Cancer Genomics (http://www.cbioportal.org/study.do7cancer_study_ 
id=prad_tcga_pub), TCGA Batch Effects (http://bioinformatics.mdanderson. 
org/tcgambatch/), Regulome Explorer (http://explorer.cancerregulome.org/), 
and Next-Generation Clustered Heat Maps (http://bioinformatics.mdanderson. 
org/TCG/VNGCHM Portal/). See also Supplemental Information and the TCGA 
publication page (https://tcga-data.nci.nih.gov/docs/publications/prad_2015/). 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures, 
seven figures, and three tables and can be found with this article online at 
http://dx.d 0 i. 0 rg/l 0.101 6/j.cell.201 5.1 0.025. 
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Contrary to a recent report suggesting 
that a preadolescent burst of 
cardiomyocyte proliferation promotes 
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expansion appears iimited to the 
neonatal period, with cardiomyocyte 
hypertrophy iikeiy accounting for the 
increase in the heart size. 
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SUMMARY 

The magnitude of cardiomyocyte generation in the 
adult heart has been heavily debated. A recent report 
suggests that during mouse preadolescence, cardio- 
myocyte proliferation leads to a 40% increase in the 
number of cardiomyocytes. Such an expansion 
would change our understanding of heart growth 
and have far-reaching implications for cardiac regen- 
eration. Here, using design-based stereology, we 
found that cardiomyocyte proliferation accounted 
for 30% of postnatal DNA synthesis; however, we 
were unable to detect any changes in cardiomyocyte 
number after postnatal day 11. ^^N-thymidine and 
BrdU analyses provided no evidence for a prolifera- 
tive peak in preadolescent mice. By contrast, cardio- 
myocyte multinucleation comprises 57% of post- 
natal DNA synthesis, followed by cardiomyocyte 
nuclear polyploidisation, contributing with 13% to 
DNA synthesis within the second and third postnatal 
weeks. We conclude that the majority of cardiomyo- 
cytes is set within the first postnatal week and that 
this event is followed by two waves of non-replicative 
DNA synthesis. This Matters Arising paper is in 
response to Naqvi et al. (2014), published in Cell. 
See also the associated Correspondence by Soon- 
paa et al. (2015), and the response by Naqvi et al. 
(2015), published in this issue. 

INTRODUCTION 

The replacement of cardiomyocytes has been a major challenge 
in regenerative medicine. The neonatal mouse heart exhibits 
robust myogenesis after apical resection and ischemic lesions; 
this myogenesis is mainly mediated by duplication of preexisting 
cardiomyocytes (Ali et al., 2014; Porrello et al., 2011; Puente 
et al., 2014), although there is also evidence for a contribution 
of precursor cells (c-kit-positive) to the regenerating myocar- 
dium (Jesty et al., 2012). This regenerative process seems to 
be limited to the first postnatal week (Puente et al., 2014), which 
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coincides with an increase in binucleation in cardiomyocytes 
(Soonpaa et al., 1996; Walsh et al., 2010). Senyo et al. evaluated 
the degree of cardiomyocyte proliferation via the detection of the 
non-radioactive isotope ^®N-thymidine in dividing cardiomyo- 
cytes. They observed limited myogenesis in both young and 
old mice, with annual proliferation rates of less than 1 % (Senyo 
et al., 2013), in agreement with the human cardiomyocyte 
renewal rates established by birth dating (Bergmann et al., 
2009; 2015) 

In contrast to these findings, Naqvi et al. (201 4) reported a sec- 
ond wave of cardiomyocyte proliferation during preadolescence 
that occurs in a highly synchronized fashion. Most cardiomyo- 
cytes re-entered the cell cycle starting on the evening of post- 
natal day 14, followed by mitosis and cytokinesis on postnatal 
day 1 5. This proliferative event remarkably increased the cardio- 
myocyte count by 40%. As most cardiomyocytes are binucle- 
ated at this stage, the authors suggested a model in which 
binucleated cardiomyocytes undergo karyokinesis, resulting in 
tetranucleated cardiomyocytes, followed by cytokinesis to 
generate two mononucleated cardiomyocytes and one binucle- 
ated cardiomyocyte. These replicative events were suggested to 
be mediated by thyroid hormone (T3) through the IGF-1/Akt 
pathway. These observations were provocative because 
another recent study demonstrated that the increase in oxygen 
at birth induces DNA damage, leading to cell-cycle arrest in car- 
diomyocytes shortly after birth (Puente et al., 2014). The findings 
reported by Naqvi et al. suggest a more complex regulation of 
cardiomyocyte proliferation and consequently have major impli- 
cations for the understanding of cardiomyocyte proliferation 
(Palpant and Murry, 2014; Zhang and Kuhn, 2014). We have 
re-examined their key observations, using both similar and alter- 
native approaches. We observed that cardiomyocyte expansion 
is restricted mainly to the first postnatal week, seriously chal- 
lenging the reported contribution of cardiomyocyte proliferation 
to the growing preadolescent mouse heart. Instead, we 
observed that, in addition to multinucleation, murine cardiomyo- 
cytes undergo polyploidization in the second and third postnatal 
weeks. This time period corresponds well to the increase in 
ploidy in human preadolescent hearts (Bergmann et al., 2009; 
Mollova et al., 2013). This study also supports recent reports of 
substantial early postnatal cardiomyogenesis without finding 
any evidence for a second wave of cardiomyocyte proliferation 
contributing to heart growth. 
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Figure 1 . Expansion of the Pool of Cardio- 
myocytes in Neonatal Mouse Hearts 

(A) The labeling strategy used to unequivocally 
identify cardiomyocyte nuclei and the number of 
nuclei per cardiomyocyte. Cardiomyocyte nuclei 
were labeled with antibodies against PCM-1 
(green), and the cardiomyocyte cell borders were 
labeled with antibodies against connexin43 (Cx43) 
and dystrophin (DMD) (red). The scale bar in- 
dicates 20 pm. 

(B) The cardiomyocyte density decreased during 
heart growth from P2 to PI 00. 

(C) The number of cardiomyocyte nuclei plateaued 
around P11 and remained constant thereafter 
(n = 3 to 6 for all analyzed time points). 

(D) The process of multinucleation began early in 
the neonatal period. At P9, most cardiomyocytes 
already contained two or more nuclei. 

(E) Stereological analysis revealed that the number 
of cardiomyocytes increased in the early neonatal 
period from 1.7 x 10® on P2 to 2.26 x 10® on P5 
and reach a plateau on P1 1 (95% confidence in- 
terval (blue), 95% prediction interval (red)). 

(F) The results of the volume analysis of isolated 
cardiomyocytes on P1 1 and P21 . The upper panel 
shows cardiomyocytes isolated on P11 (yellow) 
and P21 (red) rendered by the Imaris software to 
obtain their individual volumes (n = 3 for both 
groups, total of 274 cardiomyocytes analyzed). 
Scale bars, 10 pm. Note the different lengths of the 
scale bars in the images of cardiomyocytes from 
P11 and P21. The left ventricle (black bars) 
increased 2.0-fold between P11 and P21, com- 
parable to the increase in the average car- 
diomyocyte volume (gray bars) (right). * indicates 
p < 0.05; ** indicates p < 0.001 ; NS: not significant. 
All error bars indicate SD. 



RESULTS 

The Final Number of Cardiomyocytes Is Mainly 
Established within the First Postnatal Week 

We determined the number of cardiomyocytes by stereology in 
mouse hearts from postnatal day 2 (P2) to postnatal day 100 
(PI 00) (Figure 1). The identification of cardiomyocyte nuclei in 
tissue sections is challenging, particularly in the perinatal 
period when the cell density is very high (Ang et al., 2010; 
Soonpaa and Field, 1998). To circumvent this problem, we 
used antibodies against the cardiomyocyte nuclear marker 
pericentriolar material 1 (PCM-1) (Figure 1A) (Bergmann and 
Jovinge, 2012; Bergmann et al., 2011; Gilsbach et al., 2014). 
We obtained the mass of the left ventricle by weighting the 
left ventricle including the septum (Figures SI A and SIB). 
The reference volume was calculated using the tissue density 
of the myocardium (1.06 g/cm^) (Bruel and Nyengaard, 2005). 
The density of myocyte nuclei gradually decreased from 
367,504 ± 42,055/mm^ (mean ± SD) on P2 to 128,983 ± 
13,555/mm^ on P20 and 55,085 ± 10,574/mm^ on PI 00 
(Figure IB). 



Next, we established the number of postnatal cardiomyocyte 
nuclei (Experimental Procedures). The number of cardiomyocyte 
nuclei expanded continuously between P2 (1 .87 x 10® ± 0.20 x 
10®) and P11 and plateaued thereafter (4.75 x 10® ± 1.04) 
(ANOVA, post hoc Flolm-Sidak, p > 0.05) (Figure 1C). The ratio 
of mono- to multinucleated cardiomyocytes changed substan- 
tially during the first 10 postnatal days. On P2, the majority 
(93.2% ± 4.4%) of cardiomyocytes remained mononucleated, 
in agreement with previous studies, whereas on P11, only 
20.2% ± 2.2% were mononucleated, and 78.4% ± 1.5% were 
binucleated. We did not observe any further changes in this ratio 
between P11 and PI 00 (Figure ID). Taking multinucleation into 
account, we established that cardiomyocytes expanded by 
approximately 40% between P2 (1.7 x 10® ± 0.2 x 10®) and P5 
(2.3 X 10® ± 0.2 X 10®) (ANOVA, post hoc t test with Flolm-Bon- 
ferroni correction, p < 0.05) and then plateaued on P1 1 (2.6 x 
1 0® ± 0.6 X 1 0®) and remained constant at least until PI 00 (linear 
regression, R = 0.016, p = 0.935, Figures IE and SIC). 

To investigate the contribution of the increase in cardiomyo- 
cyte volume to the preadolescent growth of the left ventricle 
(Leu et al., 2001), we determined the average volume of 
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Figure 2. Cardiomyocyte DNA Synthesis in the Growing Mouse Heart 

(A) Two EdU pulses were given on 2 consecutive days. Mice were sacrificed after a 24-hr chase period. 

(B) EdU incorporation was detected in cardiomyocyte nuclei by co-labeling with PCM-1, and connexin43 (Cx43) and dystrophin (DMD) staining were used to 
delineate cell borders. Arrows indicate EdU-positive cardiomyocyte nuclei. The scale bars, 10 |im. 

(C) The postnatal DNA synthesis in cardiomyocytes was highest in mice sacrificed on P7 (-^17%), and rapidly declined thereafter to values below 1% on PI 5 
(n = 3-4, for all analyzed time points). 

(D) Mononucleated cardiomyocytes incorporated most of the EdU on P3, but by P7, most cardiomyocytes were bi- or multinucleated. The relative increase in the 
number of mononucleated EdU-positive cardiomyocytes on PI 3 may be related to the increase in nuclear ploidy (Figure 3). ** indicates p < 0.001 . 

All error bars indicate SD. 



cardiomyocytes on day P11 (4,530 ± 1,410 |im^) and P21 

(8,820 |im^ ± 3,120 |im^) (t test, p < 0.01) (Figure IF and Experi- 
mental Procedures). We determined that the 2.0-fold increase in 
the volume of the left ventricle (P11: 20.1 ± 1.3 mm^ and P21: 
40.8 ± 1 .6 mm^) could be fully explained by the volume increase 
(2.0-fold) of cardiomyocytes, indicating mainly hypertrophic 
growth of the left ventricle and establishment of the full comple- 
ment of cardiomyocytes by P1 1 . 

Postnatal DNA Synthesis Changes Mononucleated 
Cardiomyocytes into Multinucleated Cardiomyocytes 

The mice were treated with the thymidine analog EdU on two 
consecutive days and sacrificed one day after the second treat- 
ment (Figure 2A). The frequency of EdU incorporation was 
determined by co-labeling with antibodies against cardiomyo- 
cyte nuclei (PCM-1) (Figures 2B and C). The EdU labeling fre- 
quency peaked on P7 (17.3% ± 3.1%) (ANOVA, p < 0.001, 
post hoc, Flolm-Sidak, p < 0.001), followed by a continuous 
decline to ratios of less than 1 % on PI 5. On PI 00, we identified 
only one EdU-labeled cardiomyocyte nucleus among 980 
analyzed nuclei (Figure 2C). Next, we investigated the distribu- 
tion of EdU incorporation into mono- and multinucleated cardi- 
omyocytes (Figures 2B and 2D). In agreement with cardiomyo- 
cyte proliferation, EdU incorporation was predominantly 
observed in mononucleated cardiomyocytes on P3 (95.8% ± 
1.9%) and rapidly changed during the first postnatal week 
(P7), when most EdU-labeled cardiomyocytes became binucle- 
ated (mononucleated: 7.3% ± 3.5%, binucleated: 88.7% ± 
5.2%, and multinucleated: 4.0% ± 2.2%) (Figure 2D). Our data 
support a previous report suggesting an early postnatal switch 
from cardiomyocyte proliferation to multinucleation (Soonpaa 
et al., 1996; Walsh et al., 2010). 

Cardiomyocyte DNA Content Increases during the 
Second and Third Postnatal Weeks 

Many cardiomyocyte nuclei undergo DNA synthesis to become 
polyploid in humans (Adler, 1991 ; Bergmann et al., 2009; Herget 



et al., 1 997; Mollova et al., 201 3). However, ploidy changes in ju- 
venile mouse hearts have not been systematically investigated. 
We therefore determined the DNA content of murine cardiomyo- 
cyte nuclei on different postnatal days by flow cytometry (Figures 
3A to 3D, and S2). Most cardiomyocyte nuclei were diploid 
around birth until P9 (Figures 3C, left, and S2), corresponding 
to 104.7% ± 0.4% in Figure 3D. Between postnatal weeks 2 
and 3, the average DNA content per nucleus increased by 
approximately 10% in the left ventricle (Figures 3C, right, and 
S2) to 115.5% ± 2.3% (Figure 3D, ANOVA p < 0.001) and re- 
mained constant thereafter until PI 00 (Figure 3D). This finding 
is also reflected by an increase in the ploidy of the EdU-labeled 
cardiomyocyte nuclei between P3 and PI 3 (Figures 3E and 3F). 

Cardiomyocytes Exhibit No Enhanced Cell-Cycie 
Activity in Preadolescent Mice 

Ki-67 is expressed in all phases of the cell cycle including mitotic 
events (Scholzen and Gerdes, 2000). Thus, we used Ki-67 to 
exclude substantial cell-cycle activity on PI 4 and PI 5 as 
described by Naqvi et al. (2014). We sacrificed postnatal mice 
on PI 4 at 9:00 p.m., on PI 5 at noon, and on PI 6 at 9:00 a.m. 
The shorter length of our analysis intervals (14 hr and 21 hr) 
compared to the average mammalian cell-cycle length of 
approximately 24 hr (Hahn et al., 2009; Ponti et al., 2013) would 
allow us to detect any cell-cycle activity on PI 5. However, we 
could not detect any significant difference between PI 4, PI 5, 
and PI 6 (Holm-Sidak method, p > 0.05). By contrast, we 
observed that these three time points all exhibited Ki-67 fre- 
quencies of less than 2% compared to that on P7 (ANOVA, p < 
0.001) (Figures 4A and 4B). 

PCM-1 disassembles in pro-metaphase and metaphase of 
mitosis (Srsen et al., 2009). Although PCM-1 can be detected 
in all other phases of the cell cycle, we excluded the possibility 
that we underestimated the number of cycling cardiomyocytes 
by also using antibodies against cardiac troponin I to identify car- 
diomyocyte nuclei by their location in the cardiomyocyte cyto- 
plasm (Figure 4B). 
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Figure 3. Cardiomyocyte Nuclear Ploidy in the Growing Mouse Heart 

(A and B) Cardiomyocyte nuclei were labeled with isotype controls (A) and antibodies against PCM-1 (B). 

(C) Two representative flow cytometry histograms depict the DNA content of cardiomyocyte nuclei (PCM-1 -positive) from P7 and PI 8 mice. 

(D) Time course of the cardiomyocyte DNA content per nucleus. Cardiomyocyte nuclei undergo polyploidization during the second and third postnatal weeks 
(95% confidence interval [blue], 95% prediction interval [red]; 1 00% corresponds to a pure diploid population [2n], and 200% to a pure tetraploid population [4n]). 
(E and F) Flow cytometric analysis of the EdU-positive cardiomyocyte nuclear ploidy (see Figure 2A for the time course of EdU injection). (E) Representative flow 
cytometry plots for immunolabeling of P7 animals with antibodies against PCM-1 and EdU detection (PCM-1 +/EdU+). Note that the different fluorescent in- 
tensities within the EdU-positive populations indicate different levels of EdU incorporation. Cells may have gone through the S-phase twice after BrdU was 
delivered. Lower left: flow cytometry plots with isotype control antibodies and without EdU detection; upper left: flow cytometry plots with PCM-1 antibodies and 
without EdU detection. 

(F) The nuclear ploidy of EdU-positive cardiomyocyte nuclei measured at different time points (P3 to PI 3). The nuclear ploidy began to increase on P9. 2n: diploid; 
4n: tetraploid; SSC: side scatter. 

All error bars indicate SD. 



BrdU Incorporation Reveals Only Minimal DNA Synthesis 
in Preadolescent Cardiomyocytes 

Naqvi et al. proposed a highly synchronized last round of cardi- 
omyocyte proliferation beginning on the evening of P14. Thymi- 
dine analogs such as BrdU have a short biological half-life. 
Therefore, it might be difficult to detect the suggested cardio- 
myocyte DNA synthesis on P15 with a single BrdU injection. 
Consequently, we subcutaneously implanted a pellet on P13 
that continuously released BrdU (Experimental Procedures and 
Figures 4C to 4I). BrdU incorporation into cardiomyocyte nuclei 
(PCM-1 -positive) was detected by immunohistochemistry (IHC) 
(Figures 4D and 4E) and flow cytometry on PI 8 (Figures 4F to 
4H). Both experimental strategies revealed a BrdU incorporation 
rate of less than 3% in cardiomyocyte nuclei (IFIC: 1 .8% ± 0.9%; 
flow cytometry: 2.9% ± 1 .4%) (Figure 41). An analysis of the DNA 
content in BrdU-positive nuclei revealed an increase in nuclear 
ploidy in cardiomyocytes compared to non-cardiomyocytes 
(Figure 4H). These findings exclude the possibility of a major pro- 
liferative cardiomyocyte burst between PI 3 and PI 8. By 
contrast, the majority of BrdU-positive cardiomyocyte nuclei 
were tetraploid, indicating that DNA synthesis is linked to poly- 
ploidization rather than proliferation after PI 3 (also see Figure 3). 
Furthermore, we did not observe any differences in the BrdU la- 
beling frequency of endocardial (1 .4% ± 0.4%) or epicardial car- 



diomyocyte (1 .3% ± 0.6%) nuclei (paired t test, p = 0.93) (Figures 
4D and S3). 

Multi-Isotope Mass Spectrometry Analyses of 15N 
Thymidine Incorporation Provide No Evidence of a 
Second Wave of Cardiomyocyte Proliferation in 
Preadolescent Mice 

We continuously administered 15N-thymidine (Experimental 
Procedures) in preadolescent mouse hearts (P13-P23) (Fig- 
ure 5A) to exclude the possibility that continuous delivery of 
the halogenated thymidine analogs (BrdU or EdU) might be 
associated with toxic effects that alter cell turnover (Andersen 
et al., 2013; Wilson et al., 2008). Nonradioactive stable isotope 
tracers such as 15N-thymidine do not alter biochemical reac- 
tions and are not harmful to the animal (Steinhauser et al., 
2012). Multi-isotope imaging mass spectrometry (MIMS) allows 
the simultaneous detection of the stable isotopes of the same 
element (Senyo et al., 2013). Due to the low natural abundance 
of 15N (0.37%), the incorporation of a 15N-labeled tracer is 
readily detected due to an increase in the 15N:14N ratio. 
Because DNA contains a high amount of phosphate, we also 
visualized phosphorus (31 P) in addition to the 15N:14N ratio to 
identify cardiac nuclei with high spatial resolution (Figures 5B 
to 5D). At a lateral resolution of less than 100 nm. 



Cell 163 , 1 026-1 036, November 5, 201 5 ©201 5 Elsevier Inc. 1 029 




Cell 



cardiomyocytes were identified based on the cell borders and 
their subcellular specific ultrastructure, allowing the identifica- 
tion of cardiomyocyte nuclei based on their location (Figures 
5B to 5D). The nuclear 15N integration was evenly distributed 
throughout the investigated myocardium (Figure S4). In addition 
to analyzing 15N thymidine incorporation on sections, we iso- 
lated cardiomyocyte nuclei by flow cytometry (as shown in Fig- 
ure 3A and 3B), and determined 15N thymidine incorporation 
subsequently (Figure 5E). In both experimental designs we 
observed 15N-thymidine incorporation in only a small fraction 
of cardiomyocyte nuclei between P13 and P23 (0.9% to 2.1%) 
(t test, p > 0.05) (Figures 5B to 5F), comparable to the data 
obtained by BrdU infusion (Figures 4C to 4I) and not compatible 
with the robust cardiomyocyte proliferation suggested by 
Naqvi et al. 

An Integrated Model of DNA Synthesis in Postnatal 
Cardiomyocytes 

Based on our data obtained by design-based stereology (Fig- 
ures 1 A to 1 E) and by flow cytometric analysis (Figure 3D), we es- 
tablished a quantitative model of DNA synthesis in postnatal car- 
diomyocytes (Figures 6A to 6C). Cardiomyocyte proliferation is 
highest at birth, followed by a phase of multinucleation reaching 
a maximum around P7. The polyploidization of cardiomyocyte 
nuclei reflects the last wave of DNA synthesis, peaking around 
PI 4 (Figure 6B). Derivative graphs depicting the DNA content 
changes per time unit were calculated based on absolute quan- 
tifications of cardiomyocyte nucleus number (Figure 1 C), cardio- 
myocyte ploidy (Figure 3D), and cardiomyocyte number 
(Figure 1E) without relying on markers of proliferation or the 
incorporation of thymidine analogs with unknown biological 
half-lives. As the number of cardiomyocytes and number of car- 
diomyocyte nuclei substantially changes postnataly, we used 
the time of birth as a reference for calculating postnatal DNA syn- 
thesis. Multinucleation accounted for 57% of the total postnatal 
DNA synthesis, cardiomyocyte number expansion was related to 
30%, and polyploidization reflected 13% of this synthesis, 
mainly in the second and third postnatal weeks (areas under 
the curves, see Figure 6). 

DISCUSSION 

The postnatal heart has a robust capability to generate new 
myocardium after apical dissection and injury, although whether 
induced myocardiogenesis leads to complete myocardial regen- 
eration remains controversial (Andersen et al., 2014; Jesty et al., 
2012; Mahmoud et al., 2013; Sadek et al., 2014). Until recently, 
whether neonatal myocardiogenesis is a physiological phenom- 
enon or whether injury is required to initiate substantial cardio- 
myocyte proliferation has been unknown. We demonstrated 
that, even in the uninjured neonatal heart, a substantial number 
of cardiomyocytes are generated within the first postnatal 
week, in agreement with the report by Naqvi et al. (Naqvi et al., 
2014). However, by several distinct approaches, we failed to 
observe any substantial increase in the cardiomyocyte number 
or proliferation rate between PI 3 and PI 00. 

We first used design-based stereology to establish the cardi- 
omyocyte cell count in the left ventricle. Stereology is a powerful 



tool to obtain unbiased estimates of cell and nucleus numbers in 
different organ systems (Bergmann et al., 2015; Yeung et al., 

2014) . This technique relies on the accurate identification of 
cardiomyocyte nuclei; we and others have successfully utilized 
pericentriolar material 1 (PCM-1 ) as a specific marker for cardio- 
myocyte nuclei (Bergmann and Jovinge, 2012; Bergmann et al., 
201 1 ; Bergmann et al., 201 5; Gilsbach et al., 201 4; PreissI et al., 

2015) . In contrast to cell isolation strategies, stereological esti- 
mates are not dependent on isolation efficiency. Even with the 
most efficient method, retrograde perfusion using a Langendorff 
system, the expected yield of cardiomyocytes is never 100% 
and varies between ~1 .5 and ~2.5 million in mice (older than 
PI 7) (Naqvi et al., 2014). Our cardiomyocyte number estimate 
is supported by previous studies that reported similar number 
of cardiomyocyte nuclei and cardiomyocyte nucleus density in 
young adult mice (Adler et al., 1996; Bersell et al., 2009). Taking 
our stereological estimation as a reference for the cardiomyo- 
cyte count (Figure IE), Naqvi et al. cardiomyocyte isolation effi- 
ciency is estimated to be only 50% to 65% depending on the 
analyzed time points (Figures SID and S1E). Because the 
amount and composition of the extracellular matrix changes 
dramatically in growing hearts (Anderson, 2010), it is likely that 
the reported cardiomyocyte increase between PI 5 and PI 7 
could, at least in part, be explained by age-dependent differ- 
ences in the cardiomyocyte isolation efficiencies. In this case, 
already changes in isolation efficiency of 15%-20% could 
explain the discrepancy between Naqvi et al. and our data (Fig- 
ures SI D and SI E). 

We performed Ki-67 staining on PI 4, PI 5, and PI 6 and ob- 
tained no evidence of extensive cardiomyocyte proliferation 
(<2% labeling frequency). By contrast, we observed that up to 
13% of cardiomyocytes were in the cell cycle on P7, when 
most multinucleation occurs. Ki-67 expression is not restricted 
to mitosis but can be detected in all cell-cycle stages (G1, S, 
G2, and mitosis). For a cell-cycle length of approximately 24 hr 
(Hahn et al., 2009; Ponti et al., 2013), an extensive burst of 
proliferation would have been readily detected by this labeling 
strategy. One of the major caveats of Naqvi et al. immunohisto- 
chemistry approaches is the lack of a nuclear marker to unequiv- 
ocally identify cardiomyocytes as discussed above. Without us- 
ing a cardiomyocyte nuclear marker, it is difficult to distinguish 
cardiomyocyte from non-cardiomyocytes (Ang et al., 2010), 
particular in growing hearts in which the nucleus density can 
be more than 1 0-times higher than in the adult hearts (Bergmann 
et al., 2015). It is our impression that the provided images in 
Naqvi et al. do not allow an accurate discrimination between car- 
diomyocytes and non-cardiomyocytes being aurora B-positive 
(see Naqvi et al.. Figures 3B, S2, and S3). Moreover, Naqvi 
et al. reported that approximately 30% of isolated cardiomyo- 
cytes are positive for aurora B (compared to 15% measured in 
tissue sections). This labeling frequency even exceeds mitotic 
rates in embryonic hearts by far (embryonic day 14.5: 13.9% in 
G2/M phase) (Walsh et al., 2010). The measured high aurora B 
frequency of 30% in the cardiomyocyte compartment would 
result in an even more dramatic increase in the number of cardi- 
omyocytes than reported by Naqvi et al. (22% based on Langen- 
dorff preparation on PI 5). Given the estimated short duration of 
mitosis (approx. 1 .8 hr in ventricular cardiomyocytes) (Mollova 



1 030 Cell 163 , 1 026-1 036, November 5, 201 5 ©201 5 Elsevier Inc. 




Cell 







no BrdU treatment 


G 


BrdU treatment 


H 






€ 




1 


PCM-1+/BrdU+ 




j 4n 

i 


o 

0- 






o 

Q_ 


- 




1 










PCM-1-/BrdU+ 




r 






BrdU 




BrdU 




DNA 




(legend on next page) 



Cell 163, 1 026-1 036, November 5, 201 5 ©201 5 Elsevier Inc. 1 031 





Cell 



et al., 2013), 30% of mitotic cardiomyocytes in this four-hour 
window would translate to an expansion of the cardiomyocyte 
compartment by 67% (30% x 4 hr/1 .8 hr). Therefore, it seems 
reasonable to assume that Naqvi et al. have substantially overes- 
timated the number of aurora B-positive cardiomyocytes in their 
study. 

One of the most prevalent techniques for detecting cell prolif- 
eration is the administration and immunohistochemical detection 
of thymidine analogs such as BrdU or EdU. EdU injection in the 
afternoon on PI 3 and PI 4 resulted in a fraction of less than 1 % 
labeled cardiomyocyte nuclei when analyzed on PI 5 (Figure 2C). 
However, given the relatively short biological half-life of thymi- 
dine analogs, a highly synchronized proliferation of cardiomyo- 
cytes on PI 4 and PI 5 could go undetected. Therefore, we 
continuously administered BrdU by implanting a subcutaneous 
pellet on PI 3. In agreement with our other results, less than 3% 
of all cardiomyocyte nuclei had incorporated BrdU on PI 8 (Fig- 
ure 41), which is not compatible with the 40% increase in the car- 
diomyocyte cell number reported by Naqvi et al. We were also 
unable to confirm the regional differences in cardiomyocyte pro- 
liferation (epicardial versus endocardial regions) suggested by 
Naqvi et al. (Figure 4D). Furthermore, the BrdU-labeled cardio- 
myocyte nuclei between PI 3 and PI 8 were mainly tetraploid 
(Figure 4H), indicating that the majority of DNA synthesis after 
PI 3 can be attributed to polyploidization and not proliferation. 

In contrast to our present study, Naqvi et al. performed a single 
BrdU pulse on the night of PI 4, which was sufficient to label a 
substantially larger fraction (11.3%) than detected by us of 
mainly diploid cardiomyocyte nuclei after a chase period of 
4 days. Although, their BrdU pulse-chase experiments do not 
support the reported 40% increase in the cardiomyocyte cell 
number, their BrdU figure is still several-fold higher than in our 
present study. Although it is difficult to draw firm conclusions 
without a detailed analysis of primary data, our BrdU data ob- 
tained by both flow cytometry and by immunohistochemistry 
point to the possibility that Naqvi et al. cardiomyocyte nucleus 
isolation strategy was not rigorous enough to sufficiently remove 
non-cardiomyocyte nuclei from the analysis. 

Because thymidine analogs (e.g., BrdU) induce the prolifera- 
tion of hematopoietic stem cells (Wilson et al., 2008) and 
mediate toxic effects (Andersen et al., 2013), we continuously 
delivered biologically inert ^^N thymidine to detect DNA synthe- 
sis between PI 3 and P23 in cardiomyocytes. The labeling fre- 
quencies of cardiomyocyte nuclei were consistent with the re- 
sults of our previous experiments using BrdU. Less than 3% of 



all cardiomyocyte nuclei incorporated ^^N-thymidine during 
preadolescence. 

In the present study, we utilized rigorous strategies including 
design-based stereology, cell-cycle marker analysis (Ki-67), 
EdU pulse chase experiments, and continuous delivery of 
BrdU and ^®N thymidine. Furthermore, we analyzed cardiomyo- 
cyte proliferation both in sections and by flow cytometry without 
finding any evidence for a burst of proliferation or a distinct mode 
of cell division between PI 3 and PI 8. Instead, we observed 
completion of cardiomyocyte number expansion no later than 
P11. We further demonstrated that young cardiomyocytes 
exhibit multinucleation followed by polyploidization as the cells 
increase in size. 

Cardiomyocytes lose their ability to complete the cell cycle to 
increase their number. This loss leads to a progressive restriction 
of the cardiomyocyte cell cycle in which cytokinesis is first 
exchanged for multinucleation, which in turn is exchanged for pol- 
yploidization. In human hearts, the pool of cardiomyocytes and 
the degree of multinucleation are established soon after birth 
(Bergmann et al., 201 5; Mayhew et al., 1 997), although one report 
has suggested that the cardiomyocyte number increases until 
adulthood (Mollova et al., 2013). Polyploidization occurs in both 
mouse and human cardiomyocytes, to different degrees. In hu- 
mans, approximately 60% of all cardiomyocyte nuclei become 
polyploid (Bergmann et al., 2015), whereas only 10% of all cardi- 
omyocyte nuclei become polyploid in mice. However, polyploid- 
ization occurs after multinucleation at a similar time point in both 
species, mostly in preadolescence. Polyploidization occurs 
mainly in the second and third postnatal week. Therefore it partly 
overlaps with Naqvi et al. suggested increase in the number of car- 
diomyocytes between PI 5-PI 8. However, the limited extent of 
polyploidization (approximately 1 0%) makes it unlikely that Naqvi 
et al. has mistaken the moderate ploidy increase for proliferation. 

Given the important role of IGF-1 signaling in cardiomyocyte 
growth and hypertrophy (Carrasco et al., 2014; Delaughter 
et al., 1999), one might speculate that the T3-induced activation 
of the IGF-1 /IGF-1 -R/Akt pathway, as suggested by Naqvi et al., 
mainly triggers hypertrophic cardiomyocyte growth associated 
with polyploidization rather than cardiomyocyte proliferation in 
preadolescent mice. 

We used the C57bl/6N mouse strain, which is a substrain to 
the strain (C57bl/6J) used by Naqvi et al. (Naqvi et al., 2014). A 
comparison of several strains, including ICR/CD1, C3Heb/FeJ, 
C57bl/6BRL, and C57bl/6J showed only little variation regarding 
the proliferative capacity and growth in the postnatal mouse 



Figure 4. No Evidence for a Peak of Cardiomyocyte Proliferation in Preadolescent Mice 

(A-B) Mice were sacrificed on P7, P1 4 (7 pm), P1 5 (noon), and on P1 6 (9.00 am). PCM-1 (red) and cardiac troponin i (gray) iabeiing identified cardiomyocyte nuciei 
(n = 3 for aii anaiyzed time points). Cyciing cardiomyocytes were identified by Ki-67 (green) expression. Arrows indicate Ki-67-positive cardiomyocyte nuciei. The 
biack bar depicts the Ki-67 frequency on P15. The scaie bar indicates 20 |am. 

(C-i) BrdU was continuousiy deiivered via subcutaneous peiiets from P1 3 to P1 8 (Experimentai Procedures). BrdU incorporation into cardiomyocyte nuciei (PCM- 
1 -positive) was detected by immunohistochemistry (D, E, and i), and by fiow cytometry (F to i). (D) Tiie scans of a horizontai section of the mouse heart and high- 
magnification images (E) reveaied BrdU integration into cardiomyocyte (arrows) and non-cardiomyocyte nuciei (arrowheads). Dashed iines indicate the 
endocardium. Scaie bars: 50 i^m in (D) and 10 i^m in (E). (F and G) The resuits of fiow cytometric anaiyses of non-BrdU-treated mice (F) and BrdU-treated mice (G). 
(hi) Most BrdU incorporation between PI 3 and P18 couid be detected in tetrapioid cardiomyocyte nuciei, whereas most non-cardiomyocyte nuciei were dipioid. 
The different fiuorescence intensities within the BrdU-positive popuiations indicate different ieveis of BrdU incorporation. Ceiis may have gone through the 
S-phase twice after BrdU was deiivered. (i) in heart sections (n = 3) and in isoiated cardiac nuciei (n = 4), iess than 3% of aii cardiomyocyte nuciei incorporated 
BrdU between PI 3 and PI 8. By contrast, ~15% of aii non-cardiomyocyte nuciei incorporated BrdU. * indicates p < 0.05. 

Aii error bars indicate SD. 
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heart (Haubner et al., 201 2; Leu et al., 2001 ; Porrello et al., 201 1 ; 
Soonpaa et al., 1996). Therefore, it is unlikely that genetic differ- 
ences in the used substrains can explain the fundamental differ- 
ences between our and the Naqvi et al. study. 

In summary, the present study demonstrates that the mouse 
heart displays a robust generation of cardiomyocytes postna- 
tally. In the uninjured neonatal mouse heart, up to 30% of all car- 
diomyocytes are generated even after postnatal day 2, and the 
full complement of cardiomyocytes (>95%) is reached after 
11 days. Thus, there is a strong indication that the neonatal 
mouse heart harbor cues the induction of de novo myocardio- 
genesis to regenerate the diseased adult heart. 



Figure 5. MIMS Analysis of Thymidine 
Incorporation in Cardiomyocytes of Pread- 
olescent Mice 

(A-E) ^^N-thymidine was continuously delivered 
via a subcutaneous pellet between P13 and P23 
(Experimental Procedures). (B to D) Cardiac 
nuclei were identified by Ml MS-based detection 
of ^^P, (left) and "'^N (middle). Images for 
were obtained on the species The 

images reveal histological and subcellular details 
(cell borders and organelles) and therefore 
permit the identification of cardiomyocyte nuclei 
(arrows). hue-saturation-intensity images 

revealed an increase in the isotope ra- 

tios above natural abundances (blue, 0% 
excess; red, 300% excess) and therefore indi- 
cated ^®N-thymidine incorporation into cardiac 
nuclei (arrows). (E) thymidine incorporation 
in isolated cardiomyocyte nuclei (PCM-1 posi- 
tive) by flow cytometry (as shown in Figure 3A 
and B). Arrows indicate cardiomyocyte nuclei 
with ''^N-thymidine incorporation. 

(F) Both MIMS strategies revealed that less 
than 2.1% of all cardiomyocyte nuclei incorpo- 
rated "'^N-thymidine between PI 3 and P23 
(n = 6). Images B to D are indexed in Fig- 
ure S4A. Scale bars indicate 20 i^m. NS: not 
significant. 

All error bars indicate SD. 



EXPERIMENTAL PROCEDURES 
Animals 

Whole litters of five to seven C57BL/6N mice with 
confirmed birth dates were used. Size variations 
between animals of the same age were minimal 
(the mean coefficient of variation was among all 
age groups 10%). Whenever possible, littermates 
were chosen. For the analysis of mice on post- 
natal day 100 only male animals were used. 
Mice received EdU (50 mg/kg) by subcutaneous 
or intraperitoneal daily injections. BrdU and 
^^N-thymidine (euriso-top) were delivered contin- 
uously (0.5 mg/day corresponds to 60-70 mg/kg/ 
day) to PI 3 mice by subcutaneously implanted 
pellets (Innovative Research of America, USA), 
which were in place for 5 and 10 days, respec- 
tively. All procedures were approved by the local 
ethics committee. Hearts were removed without 
perfusion, and the left ventricle including the 
septum was dissected. The wet weight of the heart and the ventricles was 
determined with fine scales (±0.1 mg). 

Nuclei Isolation 

Cardiac nuclei were isolated as described previously (Bergmann et al., 2015). 
Frozen heart tissue was dissected and homogenized in lysis buffer (0.32 M su- 
crose, 10 mM Tris-HCL [pH 8], 5 mM CaCl 2 , 5 mM MgAc, 2 mM EDTA, 0.5 mM 
EGTA, 1 mM DTT) using aT-25Ultra-Turrax probe homogenizer(IKA Germany) 
at 24,000 rpm for 10 s, followed by homogenization using a type A pestle in a 
40 ml glass douncer (VWR) with eight strokes per sample. The nuclear isolates 
were filtered through 100-|am and 60-|am nylon mesh cell strainers (BD Biosci- 
ence) and centrifuged at 600 x g for 10 min. The pellets were dissolved in su- 
crose buffer (2.1 M sucrose, 10 mM Tris-HCI [pH 8], 5 mM MgAc). Finally, 10 ml 
of sucrose buffer was added to BSA-coated ultracentrifuge tubes and overlaid 
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Figure 6. Cardiomyocyte DNA Synthesis in 
Growing Mouse Hearts 

(A-C) The time course and quantification of post- 
natal cardiomyocyte DNA synthesis. (B and C) 
Cardiomyocyte proliferation, multinucleation, and 
polyploidization were observed in the postnatal 
murine heart. The magnitude and time course of 
the cardiomyocyte number expansion, multi- 
nucleation, and polyploidization were related. The 
graphs depict the DNA content changes per unit 
time based on stereological estimates and flow 
cytometric measurements of the cardiomyocyte 
number expansion, multinucleation, and poly- 
ploidization (Figures 1C to E and Figure 2E). The 
changes in DNA synthesis were related to the time 
around birth (P2). Diploid nuclei: 2n, tetraploid 
nuclei: 4n. 



with the nuclear isolate. The samples were centrifuged at 1 3,000 x g for 60 min 
in a Beckman Avanti Centrifuge (Beckman Coulter), and the nuclear pellets 
were dissolved in NSB plus buffer (0.44 M sucrose, 10 mM Tris-HCI [pH 
7.2], 70 mM KCI, 10 mM MgCl 2 , 1.5 mM spermine). All steps were performed 
at 4°C. 

BrdU and EdU Detection by Flow Cytometry 

Nuclei were labeled with antibodies against PCM-1 (Santa Cruz, 1 :200) over- 
night, and PCM-1 was visualized using an appropriate secondary antibody 
conjugated to Alexa 488 (Life Technologies). Cardiac nuclei were fixed in 
Fix/Perm (BD Biosciences) for 20 min on ice and sorted by flow cytometry 
(500,000 per tube) based on a DNA dye (Hoechst 33342). Next, the nuclei 
were resuspended in DNase I buffer (20 mM Tris-HCI [pH 8 ], 2 mM MgCl 2 , 
50 mM KCI) and incubated with 6 U of DNase I (Life Technologies) per 10® 
nuclei (37°C, 30 min). We determined that 6 U of DNase I per 10® nuclei is 
optimal for BrdU imaging of murine cardiac nuclei. The nuclei were washed 
once with PermAA/ash buffer (BD Biosciences) and resuspended in PBS 
(50 |al) for staining with BrdU antibodies conjugated to APC (BD PharMingen, 
1 :50, 30 min, room temperature). PCM-1 and BrdU incorporation were de- 
tected and quantified by flow cytometry. For EdU detection, we used the 
Click-iT EDU Alexa Fluor 647 Flow Cytometry Assay kit according to the man- 
ufacturer’s instructions (Life Technologies). 

MIMS Analysis 

Extracted mouse tissue was minced to generate 1-mm® cubes and immedi- 
ately fixed in 2.5% glutaraldehyde and 4% paraformaldehyde in 0.01 M PBS 
buffer at pH 7.2 for 2 hr at room temperature. Cardiomyocyte nuclei were iso- 
lated by flow cytometry and labeled with antibodies against PCM-1 as 
described in Experimental Procedures. Nuclei were transferred to 2.5% glutar- 
aldehyde (Sigma) in 0.1 M PBS and stored in a refrigerator. Following standard 
electron microscopy embedding protocols, the tissue or the nuclei were post- 
fixed in 2% OSO 4 in PBS for 1 hr, followed by a ddH 20 wash and dehydration in 
an ascending alcohol series. Tissue pieces were embedded in Agar 100 resin 
(Agar scientific) using propylene oxide as an intermediate agent. The resin was 
polymerized at 60°C for 48 hr. Thin sections (1 50 nm) were deposited on clean 
silicon chips and introduced into a NanoSIMS-50 ion microprobe (CAMECA, 
Gennevilliers, France) operating in scanning mode (Guerquin-Kern et al., 
2005). For the present study, we used a tightly focused Cs^ primary ion 
beam, which allowed four secondary ion species C^C~, ^^C^'^N", "'^C^®N“ 
and ®''P“) to be monitored in parallel from the same sputtered volume. The pri- 
mary beam steps over the surface of the sample to create images of the 
selected ion species. After careful Cs^ ion implantation to obtain steady-state 
ion emission, a mosaic view of the tissue over a large area was generated using 
a relatively high-intensity probe with a typical spot size of 200 nm (distance be- 
tween 16 and 84% peak intensity from a line scan). The raster size was 80 |im 



with an image definition of 256 x 256 pixels and a dwell time of 2 ms per pixel. 
These survey images permitted the statistical evaluation of the percentage of 
labeled nuclei and enabled the selection of labeled cells for further high-reso- 
lution imaging. 

High-resolution images were acquired using multiframe mode. The primary 
beam intensity was 1 pA with a typical probe size of -^100 nm. The raster size 
was between 50 and 60 |am to image whole cardiomyocytes with an improved 
image definition of 51 2 x 512 pixels. With a dwell time of 2 ms per pixel, up to 
20 frames were acquired, and the total analysis time was 2-3 hr. All survey im- 
ages and high-resolution images were processed using Imaged software. 
Before calculating local isotopic ratios, each isotopic image was properly 
aligned using the TOMOJ plugin (Messaoudii et al., 2007) with images 

used as a reference before a summed image was obtained for each ion spe- 
cies. A ’'®N:'''^N ratio map was then established from and ''^C^®N im- 

ages based on pixel-by-pixel division. A sample containing no labeled cells 
was used as a working reference to adjust the detectors prior to quantification 
of the ^®N:^‘^N ratios. The final ^®N:^"^N ratio map was displayed using Hue- 
Saturation-Intensity (HSI) transformation. These HSI color images were gener- 
ated using OpenMIMS, an Imaged plugin developed by Claude Lechene’s 
laboratory (Lechene et al., 2006). The hue corresponds to the ratio value, 
and the intensity at a given hue is an index of the statistical reliability. 

Stereological Analysis 

Using a design-based strategy, tissue pieces (1-2 mm diameter) from the left 
ventricle (including the septum) were sampled. Tissues were embedded in 8 % 
gelatin, and isectors (spheres) with a maximum diameter of 4 mm were pre- 
pared to obtain isotropic, uniform random alignment of the samples. The isec- 
tors were embedded, and 40-^im-thick cryo sections were stained for stereo- 
logical quantitation. Cardiomyocyte nuclei were stained with antibodies 
against PCM-1 , and nuclei were stained with DRAQ5 (see Immunohistochem- 
istry). The analysis was performed on an LSM700 confocal microscope (63 x 
Plan-Apo oil objective) using ZEN2010b software with the NewCast Module 
(Visiopharm A/S, version 4.x). A minimum of 3-4 isectors were sampled, and 
a minimum of 200 nuclei per animal were counted ( 1 %- 2 % of the area of 
the region of interest). A systematic random sampling scheme (meander sam- 
pling) was applied using the optical dissector with a counting frame (40 ^im x 
40 i^m X 20 i^m, and 3 i^m guard zones). We defined local vertical areas where 
myocytes had been cut along their longitudinal axis to determine the number of 
nuclei per myocyte. Myocyte cell borders were labeled with connexin-43 and 
dystrophin. Wheat germ agglutinin (WGA) was added to facilitate the identifi- 
cation of the cell borders in P2 and P3 animals, and myocyte nuclei were 
labeled with PCM-1. To estimate the total numbers of nuclei in the heart, we 
utilized the two-step Ny x Vref method (where Ny is an estimate of the numer- 
ical cell density and Vref is the reference volume of the tissue or organ region 
of interest) using an optical dissector (Bruel and Nyengaard, 2005). When 
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necessary, the tissue shrinkage aiong the z axis was corrected. Tissue 
shrinkage aiong the x and y axes was not observed. The totai number of car- 
diomyocytes was caicuiated based on the number of cardiomyocyte nuciei 
and the muitinucieation ievei. 

Volume Analysis of Isolated Cardiomyocytes 

We used an adaptation of the protocoi reported by Moiiova et ai. to assess the 
voiume of the isoiate cardiomyocytes (Moiiova et ai., 2013). Briefly, frozen 
hearts were trimmed to 1-mm^ cubes and fixed in 4% paraformaldehyde at 
4°C for 2 hr, followed by washing for 2-3 hr in HBSS buffer (Ca^^, Mg^^). The 
buffer was exchanged every hour. For cardiomyocyte isolation, collagenase 
B (1.8 mg/ml) and collagenase D (2.4 mg/ml) (Roche) were added to the 
HBSS buffer (Ca^^, Mg^^), and the tissue pieces were incubated for 12 to 
24 hr on a slow shaker at 37°C. Isolated cardiomyocytes were concentrated 
by spinning at 20 x g (2 min), stained with WGA conjugated to Alexa 547 (Life 
Technologies, 1 mg/ml, 1:500), and mounted with ProLong Gold DAPI (Life 
Technologies). The entire procedure beginning with the addition of the collage- 
nase was repeated three times. Z-stack images of cardiomyocytes were 
obtained using a Zeiss confocal LSM 700 microscope (63 x Plan-Apo oil objec- 
tive), and the individual cardiomyocyte volume was measured by the Imaris 8 
(Bitplane) 3D image processing software program using the surface module. 

Immunohistochemistry 

Mouse hearts were incubated in sucrose solution (30% Sucrose in PBS) over- 
night. The tissue was embedded in Tissue Tek O.C.T. Compound (Sakura) and 
frozen in liquid nitrogen. After cryosectioning (-25°C), 14 |im thick sections 
were fixed in 2% formaldehyde for 20 min. For BrdU detection, tissue sections 
were incubated in 2N HCI at room temperature for 40 min prior to staining, fol- 
lowed by extensive washing with PBS (pH 7.4). Antibodies against BrdU (Ab- 
eam, rat 1:250), Ki67-FITC (Abeam, rabbit, SP-6, 1:100), Connexin 43 (Sigma 
Aldrich, rabbit, 1:10,000, and Abeam, mouse, 1:200), dystrophin (Atlas Anti- 
bodies, rabbit, 1 :2,000, and Abeam, mouse, 1 :200), PCM-1 (Santa Cruz, rabbit 
1:200), MHC (Abeam, mouse, 1:200), and cardiac troponin I (Abeam, mouse, 
1 :250) were used. The antibodies were visualized using appropriate secondary 
antibodies conjugated to Alexa 488, 555 and/or 647 (Life Technologies, 1 :500). 
WGA conjugated to Alexa 555 and 647 (Life Technologies, 1 mg/ml, 1 :500) was 
used to label the cell borders in P2 and P3 animals. Nuclei were visualized with 
DAPI or DRAQ5 (BioStatus), and sections were mounted with ProLong me- 
dium (Life Technologies). Image analysis was performed with a Zeiss LSM 
700 confocal microscope using Imaged software. 

Statistics 

Significant differences between continuous variables were determined by two- 
tailed paired and unpaired t tests and ANOVA followed by the post hoc Holm- 
Sidak test or by two-tailed t tests with Holm-Bonferroni correction. The data 
are presented as means with SD (mean ± SD). Sigma Plot 13.0 was used for 
statistical analysis and for dynamic fitting of the data. Derivative graphs of 
the cardiomyocyte number, cardiomyocyte nuclear number and ploidy levels 
were generated with MathCad 15.0. DNA content changes were determined 
compared to the level on P2. p < 0.05 was considered significant. 
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Erratum 
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Regulators of Gut Motility Revealed 

by a Gnotobiotic Model of Diet-Microbiome 

Interactions Related to Travel 



Neelendu Dey, Vitas E. Wagner, Laura V. Blanton, Jiye Cheng, Luigi Fontana, Rashidul Hague, Tahmeed Ahmed, 
and Jeffrey I. Gordon* 

*Correspondence: jgordon@wustl.edu 
http://dx.d 0 i. 0 rg/l 0.101 6/j.cell.201 5. 1 0.052 

(Cell 163, 95-107; September 24, 2015) 

In the above article, we show that changes in diet composition affect gut motility in a microbiota-dependent manner. While describing 
the rationale for re-deriving Ref^^~ mice as germ free on page 1 03, right column, lines 6-9, we erroneously indicated that convention- 
ally raised wild-type mice have slower transit times than their heterozygous littermates. The correct sentence should 

have stated, “We found that conventionally raised wild-type (Ref^^^) mice have significantly lower transit times (faster motility) than 
their conventionally raised heterozygous (Ret^^~) littermates (p = 0.05, one-tailed Student’s t test; as shown in Table S2F).” The pub- 
lished results of this experiment were correct, and this textual error does not affect the conclusions of the paper. The online version of 
the paper has been corrected to reflect this change. We apologize for any confusion that it may have caused. 
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Snapshot: T Cell Exhaustion 

Kristen E. Pauken and E. John Wherry 

Department of Microbiology and Institute for Immunology, University of Pennsylvania, 
Philadelphia, PA 19104, USA 



Tgjj Are a Discrete T Cell Subset 

T cell exhaustion is a distinct differentiation state that can be distinguished from naive, effector, and memory T cells. Compared to effector (T^) and memory (T^^m) cells, 
exhausted T cells display impaired effector functions (e.g., rapid production of effector cytokines, cytotoxicity) (Wherry and Kurachi, 2015). have limited proliferative 
potential, especially compared to some subsets of (e.g., and and naive T cells (TJ. also develop a distinct transcriptional program that is unlike T^, T^, or 
(Doering et al., 2012; Martinez et al., 2015; Paley et al., 2012), a feature shared between chronic infections and cancer (Baitsch et al., 2011). confer only weak or temporary 
immune pressure on chronic infections or tumors, and this immunity is ultimately ineffective (Wherry and Kurachi, 2015). can be re-invigorated by blockade of PD-1 and 
other inhibitory receptors and immunoregulatory pathways (Barber et al., 2006; Wherry and Kurachi, 2015). This re-invigoration demonstrates that at least a subset of retains 
residual protective potential and has been translated into impressive clinical results in cancer (Page et al., 2014). Exhaustion can develop in both CD4+ and CD8+ T cells, though 
CD8+ T cell exhaustion is better understood and is the focus of this Snapshot. 

Origins of Exhaustion 

Unlike other forms of T cell dysfunction (e.g., energy), exhaustion is not induced at priming (Wherry and Kurachi, 2015). Longitudinal analyses support the notion of gradual 
induction of dysfunction. For example, the transcriptional profiles of virus-specific CD8+ T cells during an acutely resolving versus developing chronic viral infection are largely 
similar at days 6 and 8 post-infection (p.i.). At day 15 p.i., these transcriptional programs diverge toward memory or exhaustion (Doering et al., 2012; Wherry and Kurachi, 2015). 
Moreover, CD8+ T cells primed during a chronic infection can develop into functional if removed from the chronic infection at day 8 p.i., but not day 15 or 30 p.i. (Angelosanto 
et al., 2012; Brooks et al., 2006). T^^and also both develop from the CD127+KLRG1” “memory precursor” subset of T^, indicating a common developmental origin forT^^ and 
Tmem (Angelosanto et al., 2012). 

Signals for the Development of 

While the development of remains incompletely understood, persisting and likely continuous rather than intermittent antigen stimulation appears to be a key signal driving 
exhaustion. However, other types of signals are also likely important. These include: proinflammatory (e.g., IFN-a/p, IL-6, IL-27) and suppressive (e.g., IL-10, TGF-p) cytokines, 
other regulatory leukocytes (e.g., regulatory T cells, immunoregulatory antigen presenting cells), and the tissue microenvironment (e.g., altered hypoxia, nutrients, pH) (Wherry 
and Kurachi, 2015). Together with chronic TCR engagement, these signals drive elevated and sustained expression of multiple inhibitory receptors (e.g., PD-1, Lag-3, Tim-3, etc.), 
altered use of key transcription factors (e.g., T-bet, Eomes, Blimp-1, NFAT/AP-1), changes in metabolism, and a transcriptional program distinct from other T cell differentiation 
states. Ultimately, these signals lead to progressive loss of effector functions, altered homeostasis compared to T^^^, and cell death due to overstimulation. As a result, T cell 
exhaustion results in poor control of pathogens or tumors. However, the ability to partially reverse exhaustion through strategies including PD-1 pathway blockade suggests that 
Tgx, or at least a subset of this pool, are not terminally differentiated and can contribute to protective immunity if re-invigorated. 

Subsets of Tgjj 

At least two subsets of T^^ exist based on expression of PD-1 and the T-box transcription factors T-bet and Eomesodermin (Eomes) (Blackburn et al., 2008; Paley et al., 2012). 
One subset expresses high T-bet and intermediate PD-1 (T-bet^' PD-T"^), whereas the other subset expresses high Eomes and high PD-1 but lower T-bet (Eomes^' PD-1^'). These 
two subsets differ in key functions. The T-bet^' PD-T"^ subset retains moderate proliferative capacity and some potential to produce effector cytokines (e.g., IFNy, TNFa) but has 
limited cytotoxicity. The Eomes^' PD-1^' subset produces less cytokines and has poor proliferative potential but retains partial cytotoxicity compared to the T-beL"' PD-T"^ subset 
(Paley et al., 2012). Only the T-beL"' PD-T"' subset can be reinvigorated by PD-1:PD-L1 pathway blockade, an observation with implications for immunotherapy (Blackburn et al., 
2008). A lineage relationship exists between these two subsets: the T-bet^' PD-T"' subset divides in response to persisting antigen, giving rise to Eomes^' PD-1^' progeny. The 
Eomes^' PD-1^' cells are terminally differentiated and do not convert back to T-bet^' PD-T"' cells (Paley et al., 2012). Thus, the T-bet^' PD-T"^ cells are referred to as the “progenitor 
subset” and the Eomes^' PD-1^' as the “terminal subset.” 

Based on Anatomical Location 

There is likely additional heterogeneity in the pool of T^^ based on tissue location and disease context. In chronic infection, the T-bet^' progenitor subset is found in the spleen 
and blood, while the Eomes'^' PD-T"' terminal subset is more abundant in non-lymphoid tissues (Paley et al., 2012) and possibly tumors. However, the distinct microenvironments 
of non-lymphoid tissues and tumors are likely to influence T^^ biology. Indeed, different tissues have different ratios of T-bet^' to Eomes'^' subsets, and T^^^ in these locations may 
differ in other key phenotypic and functional attributes (Blackburn et al., 2008; Paley et al., 2012). Therefore, while T^^ represents a discrete differentiation state compared to T,^, 
Tg, and T^^^, the disease context and location may impart additional layers of heterogeneity. 
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